xref: /titanic_52/usr/src/man/man3c/c16rtomb.3c (revision 9a4a12bd7ce60cd60eae508b25eb7a8dae765274)
1*9a4a12bdSRobert Mustacchi.\"
2*9a4a12bdSRobert Mustacchi.\" This file and its contents are supplied under the terms of the
3*9a4a12bdSRobert Mustacchi.\" Common Development and Distribution License ("CDDL"), version 1.0.
4*9a4a12bdSRobert Mustacchi.\" You may only use this file in accordance with the terms of version
5*9a4a12bdSRobert Mustacchi.\" 1.0 of the CDDL.
6*9a4a12bdSRobert Mustacchi.\"
7*9a4a12bdSRobert Mustacchi.\" A full copy of the text of the CDDL should have accompanied this
8*9a4a12bdSRobert Mustacchi.\" source.  A copy of the CDDL is also available via the Internet at
9*9a4a12bdSRobert Mustacchi.\" http://www.illumos.org/license/CDDL.
10*9a4a12bdSRobert Mustacchi.\"
11*9a4a12bdSRobert Mustacchi.\"
12*9a4a12bdSRobert Mustacchi.\" Copyright 2020 Robert Mustacchi
13*9a4a12bdSRobert Mustacchi.\"
14*9a4a12bdSRobert Mustacchi.Dd April 23, 2020
15*9a4a12bdSRobert Mustacchi.Dt C16RTOMB 3C
16*9a4a12bdSRobert Mustacchi.Os
17*9a4a12bdSRobert Mustacchi.Sh NAME
18*9a4a12bdSRobert Mustacchi.Nm c16rtomb ,
19*9a4a12bdSRobert Mustacchi.Nm c32rtomb ,
20*9a4a12bdSRobert Mustacchi.Nm wcrtomb ,
21*9a4a12bdSRobert Mustacchi.Nm wcrtomb_l
22*9a4a12bdSRobert Mustacchi.Nd convert wide-characters to character sequences
23*9a4a12bdSRobert Mustacchi.Sh SYNOPSIS
24*9a4a12bdSRobert Mustacchi.In uchar.h
25*9a4a12bdSRobert Mustacchi.Ft size_t
26*9a4a12bdSRobert Mustacchi.Fo c16rtomb
27*9a4a12bdSRobert Mustacchi.Fa "char *restrict str"
28*9a4a12bdSRobert Mustacchi.Fa "char16_t c16"
29*9a4a12bdSRobert Mustacchi.Fa "mbstate_t *restrict ps"
30*9a4a12bdSRobert Mustacchi.Fc
31*9a4a12bdSRobert Mustacchi.Ft size_t
32*9a4a12bdSRobert Mustacchi.Fo c32rtomb
33*9a4a12bdSRobert Mustacchi.Fa "char *restrict str"
34*9a4a12bdSRobert Mustacchi.Fa "char32_t c32"
35*9a4a12bdSRobert Mustacchi.Fa "mbstate_t *restrict ps"
36*9a4a12bdSRobert Mustacchi.Fc
37*9a4a12bdSRobert Mustacchi.In stdio.h
38*9a4a12bdSRobert Mustacchi.Ft size_t
39*9a4a12bdSRobert Mustacchi.Fo wcrtomb
40*9a4a12bdSRobert Mustacchi.Fa "char *restrict str"
41*9a4a12bdSRobert Mustacchi.Fa "wchar_t wc"
42*9a4a12bdSRobert Mustacchi.Fa "mbstate_t *restrict ps"
43*9a4a12bdSRobert Mustacchi.Fc
44*9a4a12bdSRobert Mustacchi.In stdio.h
45*9a4a12bdSRobert Mustacchi.In xlocale.h
46*9a4a12bdSRobert Mustacchi.Ft size_t
47*9a4a12bdSRobert Mustacchi.Fo wcrtomb_l
48*9a4a12bdSRobert Mustacchi.Fa "char *restrict str"
49*9a4a12bdSRobert Mustacchi.Fa "wchar_t wc"
50*9a4a12bdSRobert Mustacchi.Fa "mbstate_t *restrict ps"
51*9a4a12bdSRobert Mustacchi.Fa "locale_t loc"
52*9a4a12bdSRobert Mustacchi.Fc
53*9a4a12bdSRobert Mustacchi.Sh DESCRIPTION
54*9a4a12bdSRobert MustacchiThe
55*9a4a12bdSRobert Mustacchi.Fn c16rtomb ,
56*9a4a12bdSRobert Mustacchi.Fn c32rtomb ,
57*9a4a12bdSRobert Mustacchi.Fn wcrtomb ,
58*9a4a12bdSRobert Mustacchiand
59*9a4a12bdSRobert Mustacchi.Fn wcrtomb_l
60*9a4a12bdSRobert Mustacchifunctions convert wide-character sequences into a series of multi-byte
61*9a4a12bdSRobert Mustacchicharacters.
62*9a4a12bdSRobert MustacchiThe functions work in the following formats:
63*9a4a12bdSRobert Mustacchi.Bl -tag -width wcrtomb_l
64*9a4a12bdSRobert Mustacchi.It Fn c16rtomb
65*9a4a12bdSRobert MustacchiA UTF-16 code sequence, where every code point is represented by one or
66*9a4a12bdSRobert Mustacchitwo
67*9a4a12bdSRobert Mustacchi.Vt char16_t .
68*9a4a12bdSRobert MustacchiThe UTF-16 encoding will encode certain Unicode code points as a pair of
69*9a4a12bdSRobert Mustacchitwo 16-bit code sequences, commonly referred to as a surrogate pair.
70*9a4a12bdSRobert Mustacchi.It Fn c32rtomb
71*9a4a12bdSRobert MustacchiA UTF-32 code sequence, where every code point is represented by a
72*9a4a12bdSRobert Mustacchisingle
73*9a4a12bdSRobert Mustacchi.Vt char32_t .
74*9a4a12bdSRobert MustacchiIt is illegal to pass reserved Unicode code points.
75*9a4a12bdSRobert Mustacchi.It Fn wcrtomb , Fn wcrtomb_l
76*9a4a12bdSRobert MustacchiWide characters, being a 32-bit value where every code point is
77*9a4a12bdSRobert Mustacchirepresented by a single
78*9a4a12bdSRobert Mustacchi.Vt wchar_t .
79*9a4a12bdSRobert MustacchiWhile the
80*9a4a12bdSRobert Mustacchi.Vt wchar_t
81*9a4a12bdSRobert Mustacchiand
82*9a4a12bdSRobert Mustacchi.Vt char32_t
83*9a4a12bdSRobert Mustacchiare different types, in this implementation, they are similar encodings.
84*9a4a12bdSRobert Mustacchi.El
85*9a4a12bdSRobert Mustacchi.Pp
86*9a4a12bdSRobert MustacchiThe functions all work by looking at the passed in wide-character
87*9a4a12bdSRobert Mustacchi.Po
88*9a4a12bdSRobert Mustacchi.Fa c16 ,
89*9a4a12bdSRobert Mustacchi.Fa c32 ,
90*9a4a12bdSRobert Mustacchi.Fa wc
91*9a4a12bdSRobert Mustacchi.Pc
92*9a4a12bdSRobert Mustacchiand appending it to the current conversion state,
93*9a4a12bdSRobert Mustacchi.Fa ps .
94*9a4a12bdSRobert MustacchiOnce a valid code point, based on the current locale, is found, then it
95*9a4a12bdSRobert Mustacchiwill be converted into a series of characters that are stored in
96*9a4a12bdSRobert Mustacchi.Fa str .
97*9a4a12bdSRobert MustacchiUp to
98*9a4a12bdSRobert Mustacchi.Dv MB_CUR_MAX
99*9a4a12bdSRobert Mustacchibytes will be stored in
100*9a4a12bdSRobert Mustacchi.Fa str .
101*9a4a12bdSRobert MustacchiIt is the caller's responsibility to ensure that there is sufficient
102*9a4a12bdSRobert Mustacchispace in
103*9a4a12bdSRobert Mustacchi.Fa str .
104*9a4a12bdSRobert Mustacchi.Pp
105*9a4a12bdSRobert MustacchiThe functions are all influenced by the
106*9a4a12bdSRobert Mustacchi.Dv LC_CTYPE
107*9a4a12bdSRobert Mustacchicategory of the current locale for determining what is considered a
108*9a4a12bdSRobert Mustacchivalid character.
109*9a4a12bdSRobert MustacchiFor example, in the
110*9a4a12bdSRobert Mustacchi.Sy C
111*9a4a12bdSRobert Mustacchilocale,
112*9a4a12bdSRobert Mustacchionly ASCII characters are recognized, while in a
113*9a4a12bdSRobert Mustacchi.Sy UTF-8
114*9a4a12bdSRobert Mustacchibased locale like
115*9a4a12bdSRobert Mustacchi.Sy en_us.UTF-8 ,
116*9a4a12bdSRobert Mustacchiall valid Unicode code points are recognized and will be converted into
117*9a4a12bdSRobert Mustacchithe corresponding multi-byte sequence.
118*9a4a12bdSRobert MustacchiThe
119*9a4a12bdSRobert Mustacchi.Fn wcrtomb_l
120*9a4a12bdSRobert Mustacchifunction uses the locale passed in
121*9a4a12bdSRobert Mustacchi.Fa loc
122*9a4a12bdSRobert Mustacchirather than the locale of the current thread.
123*9a4a12bdSRobert Mustacchi.Pp
124*9a4a12bdSRobert MustacchiThe
125*9a4a12bdSRobert Mustacchi.Fa ps
126*9a4a12bdSRobert Mustacchiargument represents a multi-byte conversion state which can be used
127*9a4a12bdSRobert Mustacchiacross multiple calls to a given function
128*9a4a12bdSRobert Mustacchi.Pq but not mixed between functions .
129*9a4a12bdSRobert MustacchiThese allow for characters to be consumed from subsequent buffers, e.g.
130*9a4a12bdSRobert Mustacchidifferent values of
131*9a4a12bdSRobert Mustacchi.Fa str .
132*9a4a12bdSRobert MustacchiThe functions may be called from multiple threads as long as they use
133*9a4a12bdSRobert Mustacchiunique values for
134*9a4a12bdSRobert Mustacchi.Fa ps .
135*9a4a12bdSRobert MustacchiIf
136*9a4a12bdSRobert Mustacchi.Fa ps
137*9a4a12bdSRobert Mustacchiis
138*9a4a12bdSRobert Mustacchi.Dv NULL ,
139*9a4a12bdSRobert Mustacchithen a function-specific buffer will be used for the conversion state;
140*9a4a12bdSRobert Mustacchihowever, this is stored between all threads and its use is not
141*9a4a12bdSRobert Mustacchirecommended.
142*9a4a12bdSRobert Mustacchi.Pp
143*9a4a12bdSRobert MustacchiThe functions all have a special behavior when
144*9a4a12bdSRobert Mustacchi.Dv NULL
145*9a4a12bdSRobert Mustacchiis passed for
146*9a4a12bdSRobert Mustacchi.Fa str .
147*9a4a12bdSRobert MustacchiThey instead will treat it as though a the NULL wide-character was
148*9a4a12bdSRobert Mustacchipassed in
149*9a4a12bdSRobert Mustacchi.Fa c16 ,
150*9a4a12bdSRobert Mustacchi.Fa c32 ,
151*9a4a12bdSRobert Mustacchior
152*9a4a12bdSRobert Mustacchi.Fa wc
153*9a4a12bdSRobert Mustacchiand an internal buffer
154*9a4a12bdSRobert Mustacchi.Pq buf
155*9a4a12bdSRobert Mustacchiwill be used to write out the results of the
156*9a4a12bdSRobert Mustacchiconverstion.
157*9a4a12bdSRobert MustacchiIn other words, the functions would be called as:
158*9a4a12bdSRobert Mustacchi.Bd -literal -offset indent
159*9a4a12bdSRobert Mustacchic16rtomb(buf, L'\\0', ps)
160*9a4a12bdSRobert Mustacchic32rtomb(buf, L'\\0', ps)
161*9a4a12bdSRobert Mustacchiwcrtomb(buf, L'\\0', ps)
162*9a4a12bdSRobert Mustacchiwcrtomb_l(buf, L'\\0', ps, loc)
163*9a4a12bdSRobert Mustacchi.Ed
164*9a4a12bdSRobert Mustacchi.Ss Locale Details
165*9a4a12bdSRobert MustacchiNot all locales in the system are Unicode based locales.
166*9a4a12bdSRobert MustacchiFor example, ISO 8859 family locales have code points with values that
167*9a4a12bdSRobert Mustacchido not match their counterparts in Unicode.
168*9a4a12bdSRobert MustacchiWhen using these functions with non-Unicode based locales, the code
169*9a4a12bdSRobert Mustacchipoints returned will be those determined by the locale.
170*9a4a12bdSRobert MustacchiThey will not be converted from the corresponding Unicode code point.
171*9a4a12bdSRobert MustacchiFor example, if using the Euro sign in ISO 8859-15, these functions
172*9a4a12bdSRobert Mustacchiwill not encode the Unicode value 0x20ac into the ISO 8859-15 value
173*9a4a12bdSRobert Mustacchi0xa4.
174*9a4a12bdSRobert Mustacchi.Pp
175*9a4a12bdSRobert MustacchiRegardless of the locale, the characters returned will be encoded as
176*9a4a12bdSRobert Mustacchithough the code point were the corresponding value in Unicode.
177*9a4a12bdSRobert MustacchiThis means that when using UTF-16, if the corresponding code point were
178*9a4a12bdSRobert Mustacchiin the range for surorgate pairs, then the
179*9a4a12bdSRobert Mustacchi.Fn c16rtomb
180*9a4a12bdSRobert Mustacchifunction will expect to receive that code point in that fashion.
181*9a4a12bdSRobert Mustacchi.Pp
182*9a4a12bdSRobert MustacchiThis behavior of the
183*9a4a12bdSRobert Mustacchi.Fn c16rtomb
184*9a4a12bdSRobert Mustacchiand
185*9a4a12bdSRobert Mustacchi.Fn c32rtomb
186*9a4a12bdSRobert Mustacchifunctions should not be relied upon, is not portable, and subject to
187*9a4a12bdSRobert Mustacchichange for non-Unicode locales.
188*9a4a12bdSRobert Mustacchi.Sh RETURN VALUES
189*9a4a12bdSRobert MustacchiUpon successful completion, the
190*9a4a12bdSRobert Mustacchi.Fn c16rtomb ,
191*9a4a12bdSRobert Mustacchi.Fn c32rtomb ,
192*9a4a12bdSRobert Mustacchi.Fn wcrtomb ,
193*9a4a12bdSRobert Mustacchiand
194*9a4a12bdSRobert Mustacchi.Fn wcrtomb_l
195*9a4a12bdSRobert Mustacchifunctions return the number of bytes stored in
196*9a4a12bdSRobert Mustacchi.Fa str .
197*9a4a12bdSRobert MustacchiOtherwise,
198*9a4a12bdSRobert Mustacchi.Sy (size_t)-1
199*9a4a12bdSRobert Mustacchiis returned to indicate an encoding error and
200*9a4a12bdSRobert Mustacchi.Va errno
201*9a4a12bdSRobert Mustacchiis set.
202*9a4a12bdSRobert Mustacchi.Sh EXAMPLES
203*9a4a12bdSRobert Mustacchi.Sy Example 1
204*9a4a12bdSRobert MustacchiConverting a UTF-32 character into a multi-byte character sequence.
205*9a4a12bdSRobert Mustacchi.Bd -literal
206*9a4a12bdSRobert Mustacchi#include <locale.h>
207*9a4a12bdSRobert Mustacchi#include <stdlib.h>
208*9a4a12bdSRobert Mustacchi#include <string.h>
209*9a4a12bdSRobert Mustacchi#include <err.h>
210*9a4a12bdSRobert Mustacchi#include <stdio.h>
211*9a4a12bdSRobert Mustacchi#include <uchar.h>
212*9a4a12bdSRobert Mustacchi
213*9a4a12bdSRobert Mustacchiint
214*9a4a12bdSRobert Mustacchimain(void)
215*9a4a12bdSRobert Mustacchi{
216*9a4a12bdSRobert Mustacchi        mbstate_t mbs;
217*9a4a12bdSRobert Mustacchi        size_t ret;
218*9a4a12bdSRobert Mustacchi        char buf[MB_CUR_MAX];
219*9a4a12bdSRobert Mustacchi        char32_t val = 0x5149;
220*9a4a12bdSRobert Mustacchi        const char *uchar_exp = "\exe5\ex85\ex89";
221*9a4a12bdSRobert Mustacchi
222*9a4a12bdSRobert Mustacchi        (void) memset(&mbs, 0, sizeof (mbs));
223*9a4a12bdSRobert Mustacchi        (void) setlocale(LC_CTYPE, "en_US.UTF-8");
224*9a4a12bdSRobert Mustacchi        ret = c32rtomb(buf, val, &mbs);
225*9a4a12bdSRobert Mustacchi        if (ret != strlen(uchar_exp)) {
226*9a4a12bdSRobert Mustacchi                errx(EXIT_FAILURE, "failed to convert string, got %zd",
227*9a4a12bdSRobert Mustacchi                    ret);
228*9a4a12bdSRobert Mustacchi        }
229*9a4a12bdSRobert Mustacchi
230*9a4a12bdSRobert Mustacchi        if (strncmp(buf, uchar_exp, ret) != 0) {
231*9a4a12bdSRobert Mustacchi                errx(EXIT_FAILURE, "converted char32_t does not match "
232*9a4a12bdSRobert Mustacchi                    "expected value");
233*9a4a12bdSRobert Mustacchi        }
234*9a4a12bdSRobert Mustacchi
235*9a4a12bdSRobert Mustacchi        return (0);
236*9a4a12bdSRobert Mustacchi}
237*9a4a12bdSRobert Mustacchi.Ed
238*9a4a12bdSRobert Mustacchi.Sh ERRORS
239*9a4a12bdSRobert MustacchiThe
240*9a4a12bdSRobert Mustacchi.Fn c16rtomb ,
241*9a4a12bdSRobert Mustacchi.Fn c32rtomb ,
242*9a4a12bdSRobert Mustacchi.Fn wcrtomb ,
243*9a4a12bdSRobert Mustacchiand
244*9a4a12bdSRobert Mustacchi.Fn wcrtomb_l
245*9a4a12bdSRobert Mustacchifunctions will fail if:
246*9a4a12bdSRobert Mustacchi.Bl -tag -width Er
247*9a4a12bdSRobert Mustacchi.It Er EINVAL
248*9a4a12bdSRobert MustacchiThe conversion state in
249*9a4a12bdSRobert Mustacchi.Fa ps
250*9a4a12bdSRobert Mustacchiis invalid.
251*9a4a12bdSRobert Mustacchi.It Er EILSEQ
252*9a4a12bdSRobert MustacchiAn invalid character sequence has been detected.
253*9a4a12bdSRobert Mustacchi.El
254*9a4a12bdSRobert Mustacchi.Sh MT-LEVEL
255*9a4a12bdSRobert MustacchiThe
256*9a4a12bdSRobert Mustacchi.Fn c16rtomb ,
257*9a4a12bdSRobert Mustacchi.Fn c32rtomb ,
258*9a4a12bdSRobert Mustacchi.Fn wcrtomb ,
259*9a4a12bdSRobert Mustacchiand
260*9a4a12bdSRobert Mustacchi.Fn wcrtomb_l
261*9a4a12bdSRobert Mustacchifunctions are
262*9a4a12bdSRobert Mustacchi.Sy MT-Safe
263*9a4a12bdSRobert Mustacchias long as different
264*9a4a12bdSRobert Mustacchi.Vt mbstate_t
265*9a4a12bdSRobert Mustacchistructures are passed in
266*9a4a12bdSRobert Mustacchi.Fa ps .
267*9a4a12bdSRobert MustacchiIf
268*9a4a12bdSRobert Mustacchi.Fa ps
269*9a4a12bdSRobert Mustacchiis
270*9a4a12bdSRobert Mustacchi.Dv NULL
271*9a4a12bdSRobert Mustacchior different threads use the same value for
272*9a4a12bdSRobert Mustacchi.Fa ps ,
273*9a4a12bdSRobert Mustacchithen the functions are
274*9a4a12bdSRobert Mustacchi.Sy Unsafe .
275*9a4a12bdSRobert Mustacchi.Sh INTERFACE STABILITY
276*9a4a12bdSRobert Mustacchi.Sy Committed
277*9a4a12bdSRobert Mustacchi.Sh SEE ALSO
278*9a4a12bdSRobert Mustacchi.Xr mbrtoc16 3C ,
279*9a4a12bdSRobert Mustacchi.Xr mbrtoc32 3C ,
280*9a4a12bdSRobert Mustacchi.Xr mbrtowc 3C ,
281*9a4a12bdSRobert Mustacchi.Xr newlocale 3C ,
282*9a4a12bdSRobert Mustacchi.Xr setlocale 3C ,
283*9a4a12bdSRobert Mustacchi.Xr uselocale 3C ,
284*9a4a12bdSRobert Mustacchi.Xr uchar.h 3HEAD ,
285*9a4a12bdSRobert Mustacchi.Xr environ 5
286