xref: /titanic_50/usr/src/man/man3c/c16rtomb.3c (revision 3fc10f8cbc2fd5dd5cd13044edf9cb68a1ef422b)
1*3fc10f8cSRobert Mustacchi.\"
2*3fc10f8cSRobert Mustacchi.\" This file and its contents are supplied under the terms of the
3*3fc10f8cSRobert Mustacchi.\" Common Development and Distribution License ("CDDL"), version 1.0.
4*3fc10f8cSRobert Mustacchi.\" You may only use this file in accordance with the terms of version
5*3fc10f8cSRobert Mustacchi.\" 1.0 of the CDDL.
6*3fc10f8cSRobert Mustacchi.\"
7*3fc10f8cSRobert Mustacchi.\" A full copy of the text of the CDDL should have accompanied this
8*3fc10f8cSRobert Mustacchi.\" source.  A copy of the CDDL is also available via the Internet at
9*3fc10f8cSRobert Mustacchi.\" http://www.illumos.org/license/CDDL.
10*3fc10f8cSRobert Mustacchi.\"
11*3fc10f8cSRobert Mustacchi.\"
12*3fc10f8cSRobert Mustacchi.\" Copyright 2020 Robert Mustacchi
13*3fc10f8cSRobert Mustacchi.\"
14*3fc10f8cSRobert Mustacchi.Dd April 23, 2020
15*3fc10f8cSRobert Mustacchi.Dt C16RTOMB 3C
16*3fc10f8cSRobert Mustacchi.Os
17*3fc10f8cSRobert Mustacchi.Sh NAME
18*3fc10f8cSRobert Mustacchi.Nm c16rtomb ,
19*3fc10f8cSRobert Mustacchi.Nm c32rtomb ,
20*3fc10f8cSRobert Mustacchi.Nm wcrtomb ,
21*3fc10f8cSRobert Mustacchi.Nm wcrtomb_l
22*3fc10f8cSRobert Mustacchi.Nd convert wide-characters to character sequences
23*3fc10f8cSRobert Mustacchi.Sh SYNOPSIS
24*3fc10f8cSRobert Mustacchi.In uchar.h
25*3fc10f8cSRobert Mustacchi.Ft size_t
26*3fc10f8cSRobert Mustacchi.Fo c16rtomb
27*3fc10f8cSRobert Mustacchi.Fa "char *restrict str"
28*3fc10f8cSRobert Mustacchi.Fa "char16_t c16"
29*3fc10f8cSRobert Mustacchi.Fa "mbstate_t *restrict ps"
30*3fc10f8cSRobert Mustacchi.Fc
31*3fc10f8cSRobert Mustacchi.Ft size_t
32*3fc10f8cSRobert Mustacchi.Fo c32rtomb
33*3fc10f8cSRobert Mustacchi.Fa "char *restrict str"
34*3fc10f8cSRobert Mustacchi.Fa "char32_t c32"
35*3fc10f8cSRobert Mustacchi.Fa "mbstate_t *restrict ps"
36*3fc10f8cSRobert Mustacchi.Fc
37*3fc10f8cSRobert Mustacchi.In stdio.h
38*3fc10f8cSRobert Mustacchi.Ft size_t
39*3fc10f8cSRobert Mustacchi.Fo wcrtomb
40*3fc10f8cSRobert Mustacchi.Fa "char *restrict str"
41*3fc10f8cSRobert Mustacchi.Fa "wchar_t wc"
42*3fc10f8cSRobert Mustacchi.Fa "mbstate_t *restrict ps"
43*3fc10f8cSRobert Mustacchi.Fc
44*3fc10f8cSRobert Mustacchi.In stdio.h
45*3fc10f8cSRobert Mustacchi.In xlocale.h
46*3fc10f8cSRobert Mustacchi.Ft size_t
47*3fc10f8cSRobert Mustacchi.Fo wcrtomb_l
48*3fc10f8cSRobert Mustacchi.Fa "char *restrict str"
49*3fc10f8cSRobert Mustacchi.Fa "wchar_t wc"
50*3fc10f8cSRobert Mustacchi.Fa "mbstate_t *restrict ps"
51*3fc10f8cSRobert Mustacchi.Fa "locale_t loc"
52*3fc10f8cSRobert Mustacchi.Fc
53*3fc10f8cSRobert Mustacchi.Sh DESCRIPTION
54*3fc10f8cSRobert MustacchiThe
55*3fc10f8cSRobert Mustacchi.Fn c16rtomb ,
56*3fc10f8cSRobert Mustacchi.Fn c32rtomb ,
57*3fc10f8cSRobert Mustacchi.Fn wcrtomb ,
58*3fc10f8cSRobert Mustacchiand
59*3fc10f8cSRobert Mustacchi.Fn wcrtomb_l
60*3fc10f8cSRobert Mustacchifunctions convert wide-character sequences into a series of multi-byte
61*3fc10f8cSRobert Mustacchicharacters.
62*3fc10f8cSRobert MustacchiThe functions work in the following formats:
63*3fc10f8cSRobert Mustacchi.Bl -tag -width wcrtomb_l
64*3fc10f8cSRobert Mustacchi.It Fn c16rtomb
65*3fc10f8cSRobert MustacchiA UTF-16 code sequence, where every code point is represented by one or
66*3fc10f8cSRobert Mustacchitwo
67*3fc10f8cSRobert Mustacchi.Vt char16_t .
68*3fc10f8cSRobert MustacchiThe UTF-16 encoding will encode certain Unicode code points as a pair of
69*3fc10f8cSRobert Mustacchitwo 16-bit code sequences, commonly referred to as a surrogate pair.
70*3fc10f8cSRobert Mustacchi.It Fn c32rtomb
71*3fc10f8cSRobert MustacchiA UTF-32 code sequence, where every code point is represented by a
72*3fc10f8cSRobert Mustacchisingle
73*3fc10f8cSRobert Mustacchi.Vt char32_t .
74*3fc10f8cSRobert MustacchiIt is illegal to pass reserved Unicode code points.
75*3fc10f8cSRobert Mustacchi.It Fn wcrtomb , Fn wcrtomb_l
76*3fc10f8cSRobert MustacchiWide characters, being a 32-bit value where every code point is
77*3fc10f8cSRobert Mustacchirepresented by a single
78*3fc10f8cSRobert Mustacchi.Vt wchar_t .
79*3fc10f8cSRobert MustacchiWhile the
80*3fc10f8cSRobert Mustacchi.Vt wchar_t
81*3fc10f8cSRobert Mustacchiand
82*3fc10f8cSRobert Mustacchi.Vt char32_t
83*3fc10f8cSRobert Mustacchiare different types, in this implementation, they are similar encodings.
84*3fc10f8cSRobert Mustacchi.El
85*3fc10f8cSRobert Mustacchi.Pp
86*3fc10f8cSRobert MustacchiThe functions all work by looking at the passed in wide-character
87*3fc10f8cSRobert Mustacchi.Po
88*3fc10f8cSRobert Mustacchi.Fa c16 ,
89*3fc10f8cSRobert Mustacchi.Fa c32 ,
90*3fc10f8cSRobert Mustacchi.Fa wc
91*3fc10f8cSRobert Mustacchi.Pc
92*3fc10f8cSRobert Mustacchiand appending it to the current conversion state,
93*3fc10f8cSRobert Mustacchi.Fa ps .
94*3fc10f8cSRobert MustacchiOnce a valid code point, based on the current locale, is found, then it
95*3fc10f8cSRobert Mustacchiwill be converted into a series of characters that are stored in
96*3fc10f8cSRobert Mustacchi.Fa str .
97*3fc10f8cSRobert MustacchiUp to
98*3fc10f8cSRobert Mustacchi.Dv MB_CUR_MAX
99*3fc10f8cSRobert Mustacchibytes will be stored in
100*3fc10f8cSRobert Mustacchi.Fa str .
101*3fc10f8cSRobert MustacchiIt is the caller's responsibility to ensure that there is sufficient
102*3fc10f8cSRobert Mustacchispace in
103*3fc10f8cSRobert Mustacchi.Fa str .
104*3fc10f8cSRobert Mustacchi.Pp
105*3fc10f8cSRobert MustacchiThe functions are all influenced by the
106*3fc10f8cSRobert Mustacchi.Dv LC_CTYPE
107*3fc10f8cSRobert Mustacchicategory of the current locale for determining what is considered a
108*3fc10f8cSRobert Mustacchivalid character.
109*3fc10f8cSRobert MustacchiFor example, in the
110*3fc10f8cSRobert Mustacchi.Sy C
111*3fc10f8cSRobert Mustacchilocale,
112*3fc10f8cSRobert Mustacchionly ASCII characters are recognized, while in a
113*3fc10f8cSRobert Mustacchi.Sy UTF-8
114*3fc10f8cSRobert Mustacchibased locale like
115*3fc10f8cSRobert Mustacchi.Sy en_us.UTF-8 ,
116*3fc10f8cSRobert Mustacchiall valid Unicode code points are recognized and will be converted into
117*3fc10f8cSRobert Mustacchithe corresponding multi-byte sequence.
118*3fc10f8cSRobert MustacchiThe
119*3fc10f8cSRobert Mustacchi.Fn wcrtomb_l
120*3fc10f8cSRobert Mustacchifunction uses the locale passed in
121*3fc10f8cSRobert Mustacchi.Fa loc
122*3fc10f8cSRobert Mustacchirather than the locale of the current thread.
123*3fc10f8cSRobert Mustacchi.Pp
124*3fc10f8cSRobert MustacchiThe
125*3fc10f8cSRobert Mustacchi.Fa ps
126*3fc10f8cSRobert Mustacchiargument represents a multi-byte conversion state which can be used
127*3fc10f8cSRobert Mustacchiacross multiple calls to a given function
128*3fc10f8cSRobert Mustacchi.Pq but not mixed between functions .
129*3fc10f8cSRobert MustacchiThese allow for characters to be consumed from subsequent buffers, e.g.
130*3fc10f8cSRobert Mustacchidifferent values of
131*3fc10f8cSRobert Mustacchi.Fa str .
132*3fc10f8cSRobert MustacchiThe functions may be called from multiple threads as long as they use
133*3fc10f8cSRobert Mustacchiunique values for
134*3fc10f8cSRobert Mustacchi.Fa ps .
135*3fc10f8cSRobert MustacchiIf
136*3fc10f8cSRobert Mustacchi.Fa ps
137*3fc10f8cSRobert Mustacchiis
138*3fc10f8cSRobert Mustacchi.Dv NULL ,
139*3fc10f8cSRobert Mustacchithen a function-specific buffer will be used for the conversion state;
140*3fc10f8cSRobert Mustacchihowever, this is stored between all threads and its use is not
141*3fc10f8cSRobert Mustacchirecommended.
142*3fc10f8cSRobert Mustacchi.Pp
143*3fc10f8cSRobert MustacchiThe functions all have a special behavior when
144*3fc10f8cSRobert Mustacchi.Dv NULL
145*3fc10f8cSRobert Mustacchiis passed for
146*3fc10f8cSRobert Mustacchi.Fa str .
147*3fc10f8cSRobert MustacchiThey instead will treat it as though a the NULL wide-character was
148*3fc10f8cSRobert Mustacchipassed in
149*3fc10f8cSRobert Mustacchi.Fa c16 ,
150*3fc10f8cSRobert Mustacchi.Fa c32 ,
151*3fc10f8cSRobert Mustacchior
152*3fc10f8cSRobert Mustacchi.Fa wc
153*3fc10f8cSRobert Mustacchiand an internal buffer
154*3fc10f8cSRobert Mustacchi.Pq buf
155*3fc10f8cSRobert Mustacchiwill be used to write out the results of the
156*3fc10f8cSRobert Mustacchiconverstion.
157*3fc10f8cSRobert MustacchiIn other words, the functions would be called as:
158*3fc10f8cSRobert Mustacchi.Bd -literal -offset indent
159*3fc10f8cSRobert Mustacchic16rtomb(buf, L'\\0', ps)
160*3fc10f8cSRobert Mustacchic32rtomb(buf, L'\\0', ps)
161*3fc10f8cSRobert Mustacchiwcrtomb(buf, L'\\0', ps)
162*3fc10f8cSRobert Mustacchiwcrtomb_l(buf, L'\\0', ps, loc)
163*3fc10f8cSRobert Mustacchi.Ed
164*3fc10f8cSRobert Mustacchi.Ss Locale Details
165*3fc10f8cSRobert MustacchiNot all locales in the system are Unicode based locales.
166*3fc10f8cSRobert MustacchiFor example, ISO 8859 family locales have code points with values that
167*3fc10f8cSRobert Mustacchido not match their counterparts in Unicode.
168*3fc10f8cSRobert MustacchiWhen using these functions with non-Unicode based locales, the code
169*3fc10f8cSRobert Mustacchipoints returned will be those determined by the locale.
170*3fc10f8cSRobert MustacchiThey will not be converted from the corresponding Unicode code point.
171*3fc10f8cSRobert MustacchiFor example, if using the Euro sign in ISO 8859-15, these functions
172*3fc10f8cSRobert Mustacchiwill not encode the Unicode value 0x20ac into the ISO 8859-15 value
173*3fc10f8cSRobert Mustacchi0xa4.
174*3fc10f8cSRobert Mustacchi.Pp
175*3fc10f8cSRobert MustacchiRegardless of the locale, the characters returned will be encoded as
176*3fc10f8cSRobert Mustacchithough the code point were the corresponding value in Unicode.
177*3fc10f8cSRobert MustacchiThis means that when using UTF-16, if the corresponding code point were
178*3fc10f8cSRobert Mustacchiin the range for surorgate pairs, then the
179*3fc10f8cSRobert Mustacchi.Fn c16rtomb
180*3fc10f8cSRobert Mustacchifunction will expect to receive that code point in that fashion.
181*3fc10f8cSRobert Mustacchi.Pp
182*3fc10f8cSRobert MustacchiThis behavior of the
183*3fc10f8cSRobert Mustacchi.Fn c16rtomb
184*3fc10f8cSRobert Mustacchiand
185*3fc10f8cSRobert Mustacchi.Fn c32rtomb
186*3fc10f8cSRobert Mustacchifunctions should not be relied upon, is not portable, and subject to
187*3fc10f8cSRobert Mustacchichange for non-Unicode locales.
188*3fc10f8cSRobert Mustacchi.Sh RETURN VALUES
189*3fc10f8cSRobert MustacchiUpon successful completion, the
190*3fc10f8cSRobert Mustacchi.Fn c16rtomb ,
191*3fc10f8cSRobert Mustacchi.Fn c32rtomb ,
192*3fc10f8cSRobert Mustacchi.Fn wcrtomb ,
193*3fc10f8cSRobert Mustacchiand
194*3fc10f8cSRobert Mustacchi.Fn wcrtomb_l
195*3fc10f8cSRobert Mustacchifunctions return the number of bytes stored in
196*3fc10f8cSRobert Mustacchi.Fa str .
197*3fc10f8cSRobert MustacchiOtherwise,
198*3fc10f8cSRobert Mustacchi.Sy (size_t)-1
199*3fc10f8cSRobert Mustacchiis returned to indicate an encoding error and
200*3fc10f8cSRobert Mustacchi.Va errno
201*3fc10f8cSRobert Mustacchiis set.
202*3fc10f8cSRobert Mustacchi.Sh EXAMPLES
203*3fc10f8cSRobert Mustacchi.Sy Example 1
204*3fc10f8cSRobert MustacchiConverting a UTF-32 character into a multi-byte character sequence.
205*3fc10f8cSRobert Mustacchi.Bd -literal
206*3fc10f8cSRobert Mustacchi#include <locale.h>
207*3fc10f8cSRobert Mustacchi#include <stdlib.h>
208*3fc10f8cSRobert Mustacchi#include <string.h>
209*3fc10f8cSRobert Mustacchi#include <err.h>
210*3fc10f8cSRobert Mustacchi#include <stdio.h>
211*3fc10f8cSRobert Mustacchi#include <uchar.h>
212*3fc10f8cSRobert Mustacchi
213*3fc10f8cSRobert Mustacchiint
214*3fc10f8cSRobert Mustacchimain(void)
215*3fc10f8cSRobert Mustacchi{
216*3fc10f8cSRobert Mustacchi        mbstate_t mbs;
217*3fc10f8cSRobert Mustacchi        size_t ret;
218*3fc10f8cSRobert Mustacchi        char buf[MB_CUR_MAX];
219*3fc10f8cSRobert Mustacchi        char32_t val = 0x5149;
220*3fc10f8cSRobert Mustacchi        const char *uchar_exp = "\exe5\ex85\ex89";
221*3fc10f8cSRobert Mustacchi
222*3fc10f8cSRobert Mustacchi        (void) memset(&mbs, 0, sizeof (mbs));
223*3fc10f8cSRobert Mustacchi        (void) setlocale(LC_CTYPE, "en_US.UTF-8");
224*3fc10f8cSRobert Mustacchi        ret = c32rtomb(buf, val, &mbs);
225*3fc10f8cSRobert Mustacchi        if (ret != strlen(uchar_exp)) {
226*3fc10f8cSRobert Mustacchi                errx(EXIT_FAILURE, "failed to convert string, got %zd",
227*3fc10f8cSRobert Mustacchi                    ret);
228*3fc10f8cSRobert Mustacchi        }
229*3fc10f8cSRobert Mustacchi
230*3fc10f8cSRobert Mustacchi        if (strncmp(buf, uchar_exp, ret) != 0) {
231*3fc10f8cSRobert Mustacchi                errx(EXIT_FAILURE, "converted char32_t does not match "
232*3fc10f8cSRobert Mustacchi                    "expected value");
233*3fc10f8cSRobert Mustacchi        }
234*3fc10f8cSRobert Mustacchi
235*3fc10f8cSRobert Mustacchi        return (0);
236*3fc10f8cSRobert Mustacchi}
237*3fc10f8cSRobert Mustacchi.Ed
238*3fc10f8cSRobert Mustacchi.Sh ERRORS
239*3fc10f8cSRobert MustacchiThe
240*3fc10f8cSRobert Mustacchi.Fn c16rtomb ,
241*3fc10f8cSRobert Mustacchi.Fn c32rtomb ,
242*3fc10f8cSRobert Mustacchi.Fn wcrtomb ,
243*3fc10f8cSRobert Mustacchiand
244*3fc10f8cSRobert Mustacchi.Fn wcrtomb_l
245*3fc10f8cSRobert Mustacchifunctions will fail if:
246*3fc10f8cSRobert Mustacchi.Bl -tag -width Er
247*3fc10f8cSRobert Mustacchi.It Er EINVAL
248*3fc10f8cSRobert MustacchiThe conversion state in
249*3fc10f8cSRobert Mustacchi.Fa ps
250*3fc10f8cSRobert Mustacchiis invalid.
251*3fc10f8cSRobert Mustacchi.It Er EILSEQ
252*3fc10f8cSRobert MustacchiAn invalid character sequence has been detected.
253*3fc10f8cSRobert Mustacchi.El
254*3fc10f8cSRobert Mustacchi.Sh MT-LEVEL
255*3fc10f8cSRobert MustacchiThe
256*3fc10f8cSRobert Mustacchi.Fn c16rtomb ,
257*3fc10f8cSRobert Mustacchi.Fn c32rtomb ,
258*3fc10f8cSRobert Mustacchi.Fn wcrtomb ,
259*3fc10f8cSRobert Mustacchiand
260*3fc10f8cSRobert Mustacchi.Fn wcrtomb_l
261*3fc10f8cSRobert Mustacchifunctions are
262*3fc10f8cSRobert Mustacchi.Sy MT-Safe
263*3fc10f8cSRobert Mustacchias long as different
264*3fc10f8cSRobert Mustacchi.Vt mbstate_t
265*3fc10f8cSRobert Mustacchistructures are passed in
266*3fc10f8cSRobert Mustacchi.Fa ps .
267*3fc10f8cSRobert MustacchiIf
268*3fc10f8cSRobert Mustacchi.Fa ps
269*3fc10f8cSRobert Mustacchiis
270*3fc10f8cSRobert Mustacchi.Dv NULL
271*3fc10f8cSRobert Mustacchior different threads use the same value for
272*3fc10f8cSRobert Mustacchi.Fa ps ,
273*3fc10f8cSRobert Mustacchithen the functions are
274*3fc10f8cSRobert Mustacchi.Sy Unsafe .
275*3fc10f8cSRobert Mustacchi.Sh INTERFACE STABILITY
276*3fc10f8cSRobert Mustacchi.Sy Committed
277*3fc10f8cSRobert Mustacchi.Sh SEE ALSO
278*3fc10f8cSRobert Mustacchi.Xr mbrtoc16 3C ,
279*3fc10f8cSRobert Mustacchi.Xr mbrtoc32 3C ,
280*3fc10f8cSRobert Mustacchi.Xr mbrtowc 3C ,
281*3fc10f8cSRobert Mustacchi.Xr newlocale 3C ,
282*3fc10f8cSRobert Mustacchi.Xr setlocale 3C ,
283*3fc10f8cSRobert Mustacchi.Xr uselocale 3C ,
284*3fc10f8cSRobert Mustacchi.Xr uchar.h 3HEAD ,
285*3fc10f8cSRobert Mustacchi.Xr environ 5
286