xref: /illumos-gate/usr/src/man/man3c/c16rtomb.3c (revision 11994f6f6fa6fc668363b92c6b6ef60b2e75ebd6)
1eda3ef2dSRobert Mustacchi.\"
2eda3ef2dSRobert Mustacchi.\" This file and its contents are supplied under the terms of the
3eda3ef2dSRobert Mustacchi.\" Common Development and Distribution License ("CDDL"), version 1.0.
4eda3ef2dSRobert Mustacchi.\" You may only use this file in accordance with the terms of version
5eda3ef2dSRobert Mustacchi.\" 1.0 of the CDDL.
6eda3ef2dSRobert Mustacchi.\"
7eda3ef2dSRobert Mustacchi.\" A full copy of the text of the CDDL should have accompanied this
8eda3ef2dSRobert Mustacchi.\" source.  A copy of the CDDL is also available via the Internet at
9eda3ef2dSRobert Mustacchi.\" http://www.illumos.org/license/CDDL.
10eda3ef2dSRobert Mustacchi.\"
11eda3ef2dSRobert Mustacchi.\"
12eda3ef2dSRobert Mustacchi.\" Copyright 2020 Robert Mustacchi
13eda3ef2dSRobert Mustacchi.\"
14*11994f6fSRobert Mustacchi.Dd December 2, 2023
15eda3ef2dSRobert Mustacchi.Dt C16RTOMB 3C
16eda3ef2dSRobert Mustacchi.Os
17eda3ef2dSRobert Mustacchi.Sh NAME
18eda3ef2dSRobert Mustacchi.Nm c16rtomb ,
19eda3ef2dSRobert Mustacchi.Nm c32rtomb ,
20eda3ef2dSRobert Mustacchi.Nm wcrtomb ,
21eda3ef2dSRobert Mustacchi.Nm wcrtomb_l
22eda3ef2dSRobert Mustacchi.Nd convert wide-characters to character sequences
23eda3ef2dSRobert Mustacchi.Sh SYNOPSIS
24eda3ef2dSRobert Mustacchi.In uchar.h
25eda3ef2dSRobert Mustacchi.Ft size_t
26eda3ef2dSRobert Mustacchi.Fo c16rtomb
27eda3ef2dSRobert Mustacchi.Fa "char *restrict str"
28eda3ef2dSRobert Mustacchi.Fa "char16_t c16"
29eda3ef2dSRobert Mustacchi.Fa "mbstate_t *restrict ps"
30eda3ef2dSRobert Mustacchi.Fc
31eda3ef2dSRobert Mustacchi.Ft size_t
32eda3ef2dSRobert Mustacchi.Fo c32rtomb
33eda3ef2dSRobert Mustacchi.Fa "char *restrict str"
34eda3ef2dSRobert Mustacchi.Fa "char32_t c32"
35eda3ef2dSRobert Mustacchi.Fa "mbstate_t *restrict ps"
36eda3ef2dSRobert Mustacchi.Fc
37eda3ef2dSRobert Mustacchi.In stdio.h
38eda3ef2dSRobert Mustacchi.Ft size_t
39eda3ef2dSRobert Mustacchi.Fo wcrtomb
40eda3ef2dSRobert Mustacchi.Fa "char *restrict str"
41eda3ef2dSRobert Mustacchi.Fa "wchar_t wc"
42eda3ef2dSRobert Mustacchi.Fa "mbstate_t *restrict ps"
43eda3ef2dSRobert Mustacchi.Fc
44eda3ef2dSRobert Mustacchi.In stdio.h
45eda3ef2dSRobert Mustacchi.In xlocale.h
46eda3ef2dSRobert Mustacchi.Ft size_t
47eda3ef2dSRobert Mustacchi.Fo wcrtomb_l
48eda3ef2dSRobert Mustacchi.Fa "char *restrict str"
49eda3ef2dSRobert Mustacchi.Fa "wchar_t wc"
50eda3ef2dSRobert Mustacchi.Fa "mbstate_t *restrict ps"
51eda3ef2dSRobert Mustacchi.Fa "locale_t loc"
52eda3ef2dSRobert Mustacchi.Fc
53eda3ef2dSRobert Mustacchi.Sh DESCRIPTION
54eda3ef2dSRobert MustacchiThe
55eda3ef2dSRobert Mustacchi.Fn c16rtomb ,
56eda3ef2dSRobert Mustacchi.Fn c32rtomb ,
57eda3ef2dSRobert Mustacchi.Fn wcrtomb ,
58eda3ef2dSRobert Mustacchiand
59eda3ef2dSRobert Mustacchi.Fn wcrtomb_l
60eda3ef2dSRobert Mustacchifunctions convert wide-character sequences into a series of multi-byte
61eda3ef2dSRobert Mustacchicharacters.
62eda3ef2dSRobert MustacchiThe functions work in the following formats:
63eda3ef2dSRobert Mustacchi.Bl -tag -width wcrtomb_l
64eda3ef2dSRobert Mustacchi.It Fn c16rtomb
65eda3ef2dSRobert MustacchiA UTF-16 code sequence, where every code point is represented by one or
66eda3ef2dSRobert Mustacchitwo
67eda3ef2dSRobert Mustacchi.Vt char16_t .
68eda3ef2dSRobert MustacchiThe UTF-16 encoding will encode certain Unicode code points as a pair of
69eda3ef2dSRobert Mustacchitwo 16-bit code sequences, commonly referred to as a surrogate pair.
70eda3ef2dSRobert Mustacchi.It Fn c32rtomb
71eda3ef2dSRobert MustacchiA UTF-32 code sequence, where every code point is represented by a
72eda3ef2dSRobert Mustacchisingle
73eda3ef2dSRobert Mustacchi.Vt char32_t .
74eda3ef2dSRobert MustacchiIt is illegal to pass reserved Unicode code points.
75eda3ef2dSRobert Mustacchi.It Fn wcrtomb , Fn wcrtomb_l
76eda3ef2dSRobert MustacchiWide characters, being a 32-bit value where every code point is
77eda3ef2dSRobert Mustacchirepresented by a single
78eda3ef2dSRobert Mustacchi.Vt wchar_t .
79eda3ef2dSRobert MustacchiWhile the
80eda3ef2dSRobert Mustacchi.Vt wchar_t
81eda3ef2dSRobert Mustacchiand
82eda3ef2dSRobert Mustacchi.Vt char32_t
83eda3ef2dSRobert Mustacchiare different types, in this implementation, they are similar encodings.
84eda3ef2dSRobert Mustacchi.El
85eda3ef2dSRobert Mustacchi.Pp
86eda3ef2dSRobert MustacchiThe functions all work by looking at the passed in wide-character
87eda3ef2dSRobert Mustacchi.Po
88eda3ef2dSRobert Mustacchi.Fa c16 ,
89eda3ef2dSRobert Mustacchi.Fa c32 ,
90eda3ef2dSRobert Mustacchi.Fa wc
91eda3ef2dSRobert Mustacchi.Pc
92eda3ef2dSRobert Mustacchiand appending it to the current conversion state,
93eda3ef2dSRobert Mustacchi.Fa ps .
94eda3ef2dSRobert MustacchiOnce a valid code point, based on the current locale, is found, then it
95eda3ef2dSRobert Mustacchiwill be converted into a series of characters that are stored in
96eda3ef2dSRobert Mustacchi.Fa str .
97eda3ef2dSRobert MustacchiUp to
98eda3ef2dSRobert Mustacchi.Dv MB_CUR_MAX
99eda3ef2dSRobert Mustacchibytes will be stored in
100eda3ef2dSRobert Mustacchi.Fa str .
101eda3ef2dSRobert MustacchiIt is the caller's responsibility to ensure that there is sufficient
102eda3ef2dSRobert Mustacchispace in
103eda3ef2dSRobert Mustacchi.Fa str .
104eda3ef2dSRobert Mustacchi.Pp
105eda3ef2dSRobert MustacchiThe functions are all influenced by the
106eda3ef2dSRobert Mustacchi.Dv LC_CTYPE
107eda3ef2dSRobert Mustacchicategory of the current locale for determining what is considered a
108eda3ef2dSRobert Mustacchivalid character.
109eda3ef2dSRobert MustacchiFor example, in the
110eda3ef2dSRobert Mustacchi.Sy C
111eda3ef2dSRobert Mustacchilocale,
112eda3ef2dSRobert Mustacchionly ASCII characters are recognized, while in a
113eda3ef2dSRobert Mustacchi.Sy UTF-8
114eda3ef2dSRobert Mustacchibased locale like
115eda3ef2dSRobert Mustacchi.Sy en_us.UTF-8 ,
116eda3ef2dSRobert Mustacchiall valid Unicode code points are recognized and will be converted into
117eda3ef2dSRobert Mustacchithe corresponding multi-byte sequence.
118eda3ef2dSRobert MustacchiThe
119eda3ef2dSRobert Mustacchi.Fn wcrtomb_l
120eda3ef2dSRobert Mustacchifunction uses the locale passed in
121eda3ef2dSRobert Mustacchi.Fa loc
122eda3ef2dSRobert Mustacchirather than the locale of the current thread.
123eda3ef2dSRobert Mustacchi.Pp
124eda3ef2dSRobert MustacchiThe
125eda3ef2dSRobert Mustacchi.Fa ps
126eda3ef2dSRobert Mustacchiargument represents a multi-byte conversion state which can be used
127eda3ef2dSRobert Mustacchiacross multiple calls to a given function
128eda3ef2dSRobert Mustacchi.Pq but not mixed between functions .
129eda3ef2dSRobert MustacchiThese allow for characters to be consumed from subsequent buffers, e.g.
130eda3ef2dSRobert Mustacchidifferent values of
131eda3ef2dSRobert Mustacchi.Fa str .
132eda3ef2dSRobert MustacchiThe functions may be called from multiple threads as long as they use
133eda3ef2dSRobert Mustacchiunique values for
134eda3ef2dSRobert Mustacchi.Fa ps .
135eda3ef2dSRobert MustacchiIf
136eda3ef2dSRobert Mustacchi.Fa ps
137eda3ef2dSRobert Mustacchiis
138eda3ef2dSRobert Mustacchi.Dv NULL ,
139eda3ef2dSRobert Mustacchithen a function-specific buffer will be used for the conversion state;
140eda3ef2dSRobert Mustacchihowever, this is stored between all threads and its use is not
141eda3ef2dSRobert Mustacchirecommended.
142eda3ef2dSRobert Mustacchi.Pp
143eda3ef2dSRobert MustacchiThe functions all have a special behavior when
144eda3ef2dSRobert Mustacchi.Dv NULL
145eda3ef2dSRobert Mustacchiis passed for
146eda3ef2dSRobert Mustacchi.Fa str .
147eda3ef2dSRobert MustacchiThey instead will treat it as though a the NULL wide-character was
148eda3ef2dSRobert Mustacchipassed in
149eda3ef2dSRobert Mustacchi.Fa c16 ,
150eda3ef2dSRobert Mustacchi.Fa c32 ,
151eda3ef2dSRobert Mustacchior
152eda3ef2dSRobert Mustacchi.Fa wc
153eda3ef2dSRobert Mustacchiand an internal buffer
154eda3ef2dSRobert Mustacchi.Pq buf
155eda3ef2dSRobert Mustacchiwill be used to write out the results of the
1564a8d6d7cSPeter Tribbleconversion.
157eda3ef2dSRobert MustacchiIn other words, the functions would be called as:
158eda3ef2dSRobert Mustacchi.Bd -literal -offset indent
159*11994f6fSRobert Mustacchic16rtomb(buf, L'\e0', ps)
160*11994f6fSRobert Mustacchic32rtomb(buf, L'\e0', ps)
161*11994f6fSRobert Mustacchiwcrtomb(buf, L'\e0', ps)
162*11994f6fSRobert Mustacchiwcrtomb_l(buf, L'\e0', ps, loc)
163eda3ef2dSRobert Mustacchi.Ed
164eda3ef2dSRobert Mustacchi.Ss Locale Details
165eda3ef2dSRobert MustacchiNot all locales in the system are Unicode based locales.
166eda3ef2dSRobert MustacchiFor example, ISO 8859 family locales have code points with values that
167eda3ef2dSRobert Mustacchido not match their counterparts in Unicode.
168eda3ef2dSRobert MustacchiWhen using these functions with non-Unicode based locales, the code
169eda3ef2dSRobert Mustacchipoints returned will be those determined by the locale.
170eda3ef2dSRobert MustacchiThey will not be converted from the corresponding Unicode code point.
171eda3ef2dSRobert MustacchiFor example, if using the Euro sign in ISO 8859-15, these functions
172eda3ef2dSRobert Mustacchiwill not encode the Unicode value 0x20ac into the ISO 8859-15 value
173eda3ef2dSRobert Mustacchi0xa4.
174eda3ef2dSRobert Mustacchi.Pp
175eda3ef2dSRobert MustacchiRegardless of the locale, the characters returned will be encoded as
176eda3ef2dSRobert Mustacchithough the code point were the corresponding value in Unicode.
177eda3ef2dSRobert MustacchiThis means that when using UTF-16, if the corresponding code point were
1784a8d6d7cSPeter Tribblein the range for surrogate pairs, then the
179eda3ef2dSRobert Mustacchi.Fn c16rtomb
180eda3ef2dSRobert Mustacchifunction will expect to receive that code point in that fashion.
181eda3ef2dSRobert Mustacchi.Pp
182eda3ef2dSRobert MustacchiThis behavior of the
183eda3ef2dSRobert Mustacchi.Fn c16rtomb
184eda3ef2dSRobert Mustacchiand
185eda3ef2dSRobert Mustacchi.Fn c32rtomb
186eda3ef2dSRobert Mustacchifunctions should not be relied upon, is not portable, and subject to
187eda3ef2dSRobert Mustacchichange for non-Unicode locales.
188eda3ef2dSRobert Mustacchi.Sh RETURN VALUES
189eda3ef2dSRobert MustacchiUpon successful completion, the
190eda3ef2dSRobert Mustacchi.Fn c16rtomb ,
191eda3ef2dSRobert Mustacchi.Fn c32rtomb ,
192eda3ef2dSRobert Mustacchi.Fn wcrtomb ,
193eda3ef2dSRobert Mustacchiand
194eda3ef2dSRobert Mustacchi.Fn wcrtomb_l
195eda3ef2dSRobert Mustacchifunctions return the number of bytes stored in
196eda3ef2dSRobert Mustacchi.Fa str .
197eda3ef2dSRobert MustacchiOtherwise,
198eda3ef2dSRobert Mustacchi.Sy (size_t)-1
199eda3ef2dSRobert Mustacchiis returned to indicate an encoding error and
200eda3ef2dSRobert Mustacchi.Va errno
201eda3ef2dSRobert Mustacchiis set.
202eda3ef2dSRobert Mustacchi.Sh EXAMPLES
203eda3ef2dSRobert Mustacchi.Sy Example 1
204eda3ef2dSRobert MustacchiConverting a UTF-32 character into a multi-byte character sequence.
205eda3ef2dSRobert Mustacchi.Bd -literal
206eda3ef2dSRobert Mustacchi#include <locale.h>
207eda3ef2dSRobert Mustacchi#include <stdlib.h>
208eda3ef2dSRobert Mustacchi#include <string.h>
209eda3ef2dSRobert Mustacchi#include <err.h>
210eda3ef2dSRobert Mustacchi#include <stdio.h>
211eda3ef2dSRobert Mustacchi#include <uchar.h>
212eda3ef2dSRobert Mustacchi
213eda3ef2dSRobert Mustacchiint
214eda3ef2dSRobert Mustacchimain(void)
215eda3ef2dSRobert Mustacchi{
216eda3ef2dSRobert Mustacchi        mbstate_t mbs;
217eda3ef2dSRobert Mustacchi        size_t ret;
218eda3ef2dSRobert Mustacchi        char buf[MB_CUR_MAX];
219eda3ef2dSRobert Mustacchi        char32_t val = 0x5149;
220eda3ef2dSRobert Mustacchi        const char *uchar_exp = "\exe5\ex85\ex89";
221eda3ef2dSRobert Mustacchi
222eda3ef2dSRobert Mustacchi        (void) memset(&mbs, 0, sizeof (mbs));
223eda3ef2dSRobert Mustacchi        (void) setlocale(LC_CTYPE, "en_US.UTF-8");
224eda3ef2dSRobert Mustacchi        ret = c32rtomb(buf, val, &mbs);
225eda3ef2dSRobert Mustacchi        if (ret != strlen(uchar_exp)) {
226eda3ef2dSRobert Mustacchi                errx(EXIT_FAILURE, "failed to convert string, got %zd",
227eda3ef2dSRobert Mustacchi                    ret);
228eda3ef2dSRobert Mustacchi        }
229eda3ef2dSRobert Mustacchi
230eda3ef2dSRobert Mustacchi        if (strncmp(buf, uchar_exp, ret) != 0) {
231eda3ef2dSRobert Mustacchi                errx(EXIT_FAILURE, "converted char32_t does not match "
232eda3ef2dSRobert Mustacchi                    "expected value");
233eda3ef2dSRobert Mustacchi        }
234eda3ef2dSRobert Mustacchi
235eda3ef2dSRobert Mustacchi        return (0);
236eda3ef2dSRobert Mustacchi}
237eda3ef2dSRobert Mustacchi.Ed
238eda3ef2dSRobert Mustacchi.Sh ERRORS
239eda3ef2dSRobert MustacchiThe
240eda3ef2dSRobert Mustacchi.Fn c16rtomb ,
241eda3ef2dSRobert Mustacchi.Fn c32rtomb ,
242eda3ef2dSRobert Mustacchi.Fn wcrtomb ,
243eda3ef2dSRobert Mustacchiand
244eda3ef2dSRobert Mustacchi.Fn wcrtomb_l
245eda3ef2dSRobert Mustacchifunctions will fail if:
246eda3ef2dSRobert Mustacchi.Bl -tag -width Er
247eda3ef2dSRobert Mustacchi.It Er EINVAL
248eda3ef2dSRobert MustacchiThe conversion state in
249eda3ef2dSRobert Mustacchi.Fa ps
250eda3ef2dSRobert Mustacchiis invalid.
251eda3ef2dSRobert Mustacchi.It Er EILSEQ
252eda3ef2dSRobert MustacchiAn invalid character sequence has been detected.
253eda3ef2dSRobert Mustacchi.El
254eda3ef2dSRobert Mustacchi.Sh MT-LEVEL
255eda3ef2dSRobert MustacchiThe
256eda3ef2dSRobert Mustacchi.Fn c16rtomb ,
257eda3ef2dSRobert Mustacchi.Fn c32rtomb ,
258eda3ef2dSRobert Mustacchi.Fn wcrtomb ,
259eda3ef2dSRobert Mustacchiand
260eda3ef2dSRobert Mustacchi.Fn wcrtomb_l
261eda3ef2dSRobert Mustacchifunctions are
262eda3ef2dSRobert Mustacchi.Sy MT-Safe
263eda3ef2dSRobert Mustacchias long as different
264eda3ef2dSRobert Mustacchi.Vt mbstate_t
265eda3ef2dSRobert Mustacchistructures are passed in
266eda3ef2dSRobert Mustacchi.Fa ps .
267eda3ef2dSRobert MustacchiIf
268eda3ef2dSRobert Mustacchi.Fa ps
269eda3ef2dSRobert Mustacchiis
270eda3ef2dSRobert Mustacchi.Dv NULL
271eda3ef2dSRobert Mustacchior different threads use the same value for
272eda3ef2dSRobert Mustacchi.Fa ps ,
273eda3ef2dSRobert Mustacchithen the functions are
274eda3ef2dSRobert Mustacchi.Sy Unsafe .
275eda3ef2dSRobert Mustacchi.Sh INTERFACE STABILITY
276eda3ef2dSRobert Mustacchi.Sy Committed
277eda3ef2dSRobert Mustacchi.Sh SEE ALSO
278eda3ef2dSRobert Mustacchi.Xr mbrtoc16 3C ,
279eda3ef2dSRobert Mustacchi.Xr mbrtoc32 3C ,
280eda3ef2dSRobert Mustacchi.Xr mbrtowc 3C ,
281eda3ef2dSRobert Mustacchi.Xr newlocale 3C ,
282eda3ef2dSRobert Mustacchi.Xr setlocale 3C ,
283eda3ef2dSRobert Mustacchi.Xr uselocale 3C ,
284eda3ef2dSRobert Mustacchi.Xr uchar.h 3HEAD ,
285bbf21555SRichard Lowe.Xr environ 7
286