19a4a12bdSRobert Mustacchi.\"
29a4a12bdSRobert Mustacchi.\" This file and its contents are supplied under the terms of the
39a4a12bdSRobert Mustacchi.\" Common Development and Distribution License ("CDDL"), version 1.0.
49a4a12bdSRobert Mustacchi.\" You may only use this file in accordance with the terms of version
59a4a12bdSRobert Mustacchi.\" 1.0 of the CDDL.
69a4a12bdSRobert Mustacchi.\"
79a4a12bdSRobert Mustacchi.\" A full copy of the text of the CDDL should have accompanied this
89a4a12bdSRobert Mustacchi.\" source.  A copy of the CDDL is also available via the Internet at
99a4a12bdSRobert Mustacchi.\" http://www.illumos.org/license/CDDL.
109a4a12bdSRobert Mustacchi.\"
119a4a12bdSRobert Mustacchi.\"
129a4a12bdSRobert Mustacchi.\" Copyright 2020 Robert Mustacchi
139a4a12bdSRobert Mustacchi.\"
149a4a12bdSRobert Mustacchi.Dd April 23, 2020
159a4a12bdSRobert Mustacchi.Dt C16RTOMB 3C
169a4a12bdSRobert Mustacchi.Os
179a4a12bdSRobert Mustacchi.Sh NAME
189a4a12bdSRobert Mustacchi.Nm c16rtomb ,
199a4a12bdSRobert Mustacchi.Nm c32rtomb ,
209a4a12bdSRobert Mustacchi.Nm wcrtomb ,
219a4a12bdSRobert Mustacchi.Nm wcrtomb_l
229a4a12bdSRobert Mustacchi.Nd convert wide-characters to character sequences
239a4a12bdSRobert Mustacchi.Sh SYNOPSIS
249a4a12bdSRobert Mustacchi.In uchar.h
259a4a12bdSRobert Mustacchi.Ft size_t
269a4a12bdSRobert Mustacchi.Fo c16rtomb
279a4a12bdSRobert Mustacchi.Fa "char *restrict str"
289a4a12bdSRobert Mustacchi.Fa "char16_t c16"
299a4a12bdSRobert Mustacchi.Fa "mbstate_t *restrict ps"
309a4a12bdSRobert Mustacchi.Fc
319a4a12bdSRobert Mustacchi.Ft size_t
329a4a12bdSRobert Mustacchi.Fo c32rtomb
339a4a12bdSRobert Mustacchi.Fa "char *restrict str"
349a4a12bdSRobert Mustacchi.Fa "char32_t c32"
359a4a12bdSRobert Mustacchi.Fa "mbstate_t *restrict ps"
369a4a12bdSRobert Mustacchi.Fc
379a4a12bdSRobert Mustacchi.In stdio.h
389a4a12bdSRobert Mustacchi.Ft size_t
399a4a12bdSRobert Mustacchi.Fo wcrtomb
409a4a12bdSRobert Mustacchi.Fa "char *restrict str"
419a4a12bdSRobert Mustacchi.Fa "wchar_t wc"
429a4a12bdSRobert Mustacchi.Fa "mbstate_t *restrict ps"
439a4a12bdSRobert Mustacchi.Fc
449a4a12bdSRobert Mustacchi.In stdio.h
459a4a12bdSRobert Mustacchi.In xlocale.h
469a4a12bdSRobert Mustacchi.Ft size_t
479a4a12bdSRobert Mustacchi.Fo wcrtomb_l
489a4a12bdSRobert Mustacchi.Fa "char *restrict str"
499a4a12bdSRobert Mustacchi.Fa "wchar_t wc"
509a4a12bdSRobert Mustacchi.Fa "mbstate_t *restrict ps"
519a4a12bdSRobert Mustacchi.Fa "locale_t loc"
529a4a12bdSRobert Mustacchi.Fc
539a4a12bdSRobert Mustacchi.Sh DESCRIPTION
549a4a12bdSRobert MustacchiThe
559a4a12bdSRobert Mustacchi.Fn c16rtomb ,
569a4a12bdSRobert Mustacchi.Fn c32rtomb ,
579a4a12bdSRobert Mustacchi.Fn wcrtomb ,
589a4a12bdSRobert Mustacchiand
599a4a12bdSRobert Mustacchi.Fn wcrtomb_l
609a4a12bdSRobert Mustacchifunctions convert wide-character sequences into a series of multi-byte
619a4a12bdSRobert Mustacchicharacters.
629a4a12bdSRobert MustacchiThe functions work in the following formats:
639a4a12bdSRobert Mustacchi.Bl -tag -width wcrtomb_l
649a4a12bdSRobert Mustacchi.It Fn c16rtomb
659a4a12bdSRobert MustacchiA UTF-16 code sequence, where every code point is represented by one or
669a4a12bdSRobert Mustacchitwo
679a4a12bdSRobert Mustacchi.Vt char16_t .
689a4a12bdSRobert MustacchiThe UTF-16 encoding will encode certain Unicode code points as a pair of
699a4a12bdSRobert Mustacchitwo 16-bit code sequences, commonly referred to as a surrogate pair.
709a4a12bdSRobert Mustacchi.It Fn c32rtomb
719a4a12bdSRobert MustacchiA UTF-32 code sequence, where every code point is represented by a
729a4a12bdSRobert Mustacchisingle
739a4a12bdSRobert Mustacchi.Vt char32_t .
749a4a12bdSRobert MustacchiIt is illegal to pass reserved Unicode code points.
759a4a12bdSRobert Mustacchi.It Fn wcrtomb , Fn wcrtomb_l
769a4a12bdSRobert MustacchiWide characters, being a 32-bit value where every code point is
779a4a12bdSRobert Mustacchirepresented by a single
789a4a12bdSRobert Mustacchi.Vt wchar_t .
799a4a12bdSRobert MustacchiWhile the
809a4a12bdSRobert Mustacchi.Vt wchar_t
819a4a12bdSRobert Mustacchiand
829a4a12bdSRobert Mustacchi.Vt char32_t
839a4a12bdSRobert Mustacchiare different types, in this implementation, they are similar encodings.
849a4a12bdSRobert Mustacchi.El
859a4a12bdSRobert Mustacchi.Pp
869a4a12bdSRobert MustacchiThe functions all work by looking at the passed in wide-character
879a4a12bdSRobert Mustacchi.Po
889a4a12bdSRobert Mustacchi.Fa c16 ,
899a4a12bdSRobert Mustacchi.Fa c32 ,
909a4a12bdSRobert Mustacchi.Fa wc
919a4a12bdSRobert Mustacchi.Pc
929a4a12bdSRobert Mustacchiand appending it to the current conversion state,
939a4a12bdSRobert Mustacchi.Fa ps .
949a4a12bdSRobert MustacchiOnce a valid code point, based on the current locale, is found, then it
959a4a12bdSRobert Mustacchiwill be converted into a series of characters that are stored in
969a4a12bdSRobert Mustacchi.Fa str .
979a4a12bdSRobert MustacchiUp to
989a4a12bdSRobert Mustacchi.Dv MB_CUR_MAX
999a4a12bdSRobert Mustacchibytes will be stored in
1009a4a12bdSRobert Mustacchi.Fa str .
1019a4a12bdSRobert MustacchiIt is the caller's responsibility to ensure that there is sufficient
1029a4a12bdSRobert Mustacchispace in
1039a4a12bdSRobert Mustacchi.Fa str .
1049a4a12bdSRobert Mustacchi.Pp
1059a4a12bdSRobert MustacchiThe functions are all influenced by the
1069a4a12bdSRobert Mustacchi.Dv LC_CTYPE
1079a4a12bdSRobert Mustacchicategory of the current locale for determining what is considered a
1089a4a12bdSRobert Mustacchivalid character.
1099a4a12bdSRobert MustacchiFor example, in the
1109a4a12bdSRobert Mustacchi.Sy C
1119a4a12bdSRobert Mustacchilocale,
1129a4a12bdSRobert Mustacchionly ASCII characters are recognized, while in a
1139a4a12bdSRobert Mustacchi.Sy UTF-8
1149a4a12bdSRobert Mustacchibased locale like
1159a4a12bdSRobert Mustacchi.Sy en_us.UTF-8 ,
1169a4a12bdSRobert Mustacchiall valid Unicode code points are recognized and will be converted into
1179a4a12bdSRobert Mustacchithe corresponding multi-byte sequence.
1189a4a12bdSRobert MustacchiThe
1199a4a12bdSRobert Mustacchi.Fn wcrtomb_l
1209a4a12bdSRobert Mustacchifunction uses the locale passed in
1219a4a12bdSRobert Mustacchi.Fa loc
1229a4a12bdSRobert Mustacchirather than the locale of the current thread.
1239a4a12bdSRobert Mustacchi.Pp
1249a4a12bdSRobert MustacchiThe
1259a4a12bdSRobert Mustacchi.Fa ps
1269a4a12bdSRobert Mustacchiargument represents a multi-byte conversion state which can be used
1279a4a12bdSRobert Mustacchiacross multiple calls to a given function
1289a4a12bdSRobert Mustacchi.Pq but not mixed between functions .
1299a4a12bdSRobert MustacchiThese allow for characters to be consumed from subsequent buffers, e.g.
1309a4a12bdSRobert Mustacchidifferent values of
1319a4a12bdSRobert Mustacchi.Fa str .
1329a4a12bdSRobert MustacchiThe functions may be called from multiple threads as long as they use
1339a4a12bdSRobert Mustacchiunique values for
1349a4a12bdSRobert Mustacchi.Fa ps .
1359a4a12bdSRobert MustacchiIf
1369a4a12bdSRobert Mustacchi.Fa ps
1379a4a12bdSRobert Mustacchiis
1389a4a12bdSRobert Mustacchi.Dv NULL ,
1399a4a12bdSRobert Mustacchithen a function-specific buffer will be used for the conversion state;
1409a4a12bdSRobert Mustacchihowever, this is stored between all threads and its use is not
1419a4a12bdSRobert Mustacchirecommended.
1429a4a12bdSRobert Mustacchi.Pp
1439a4a12bdSRobert MustacchiThe functions all have a special behavior when
1449a4a12bdSRobert Mustacchi.Dv NULL
1459a4a12bdSRobert Mustacchiis passed for
1469a4a12bdSRobert Mustacchi.Fa str .
1479a4a12bdSRobert MustacchiThey instead will treat it as though a the NULL wide-character was
1489a4a12bdSRobert Mustacchipassed in
1499a4a12bdSRobert Mustacchi.Fa c16 ,
1509a4a12bdSRobert Mustacchi.Fa c32 ,
1519a4a12bdSRobert Mustacchior
1529a4a12bdSRobert Mustacchi.Fa wc
1539a4a12bdSRobert Mustacchiand an internal buffer
1549a4a12bdSRobert Mustacchi.Pq buf
1559a4a12bdSRobert Mustacchiwill be used to write out the results of the
1569a4a12bdSRobert Mustacchiconverstion.
1579a4a12bdSRobert MustacchiIn other words, the functions would be called as:
1589a4a12bdSRobert Mustacchi.Bd -literal -offset indent
1599a4a12bdSRobert Mustacchic16rtomb(buf, L'\\0', ps)
1609a4a12bdSRobert Mustacchic32rtomb(buf, L'\\0', ps)
1619a4a12bdSRobert Mustacchiwcrtomb(buf, L'\\0', ps)
1629a4a12bdSRobert Mustacchiwcrtomb_l(buf, L'\\0', ps, loc)
1639a4a12bdSRobert Mustacchi.Ed
1649a4a12bdSRobert Mustacchi.Ss Locale Details
1659a4a12bdSRobert MustacchiNot all locales in the system are Unicode based locales.
1669a4a12bdSRobert MustacchiFor example, ISO 8859 family locales have code points with values that
1679a4a12bdSRobert Mustacchido not match their counterparts in Unicode.
1689a4a12bdSRobert MustacchiWhen using these functions with non-Unicode based locales, the code
1699a4a12bdSRobert Mustacchipoints returned will be those determined by the locale.
1709a4a12bdSRobert MustacchiThey will not be converted from the corresponding Unicode code point.
1719a4a12bdSRobert MustacchiFor example, if using the Euro sign in ISO 8859-15, these functions
1729a4a12bdSRobert Mustacchiwill not encode the Unicode value 0x20ac into the ISO 8859-15 value
1739a4a12bdSRobert Mustacchi0xa4.
1749a4a12bdSRobert Mustacchi.Pp
1759a4a12bdSRobert MustacchiRegardless of the locale, the characters returned will be encoded as
1769a4a12bdSRobert Mustacchithough the code point were the corresponding value in Unicode.
1779a4a12bdSRobert MustacchiThis means that when using UTF-16, if the corresponding code point were
1789a4a12bdSRobert Mustacchiin the range for surorgate pairs, then the
1799a4a12bdSRobert Mustacchi.Fn c16rtomb
1809a4a12bdSRobert Mustacchifunction will expect to receive that code point in that fashion.
1819a4a12bdSRobert Mustacchi.Pp
1829a4a12bdSRobert MustacchiThis behavior of the
1839a4a12bdSRobert Mustacchi.Fn c16rtomb
1849a4a12bdSRobert Mustacchiand
1859a4a12bdSRobert Mustacchi.Fn c32rtomb
1869a4a12bdSRobert Mustacchifunctions should not be relied upon, is not portable, and subject to
1879a4a12bdSRobert Mustacchichange for non-Unicode locales.
1889a4a12bdSRobert Mustacchi.Sh RETURN VALUES
1899a4a12bdSRobert MustacchiUpon successful completion, the
1909a4a12bdSRobert Mustacchi.Fn c16rtomb ,
1919a4a12bdSRobert Mustacchi.Fn c32rtomb ,
1929a4a12bdSRobert Mustacchi.Fn wcrtomb ,
1939a4a12bdSRobert Mustacchiand
1949a4a12bdSRobert Mustacchi.Fn wcrtomb_l
1959a4a12bdSRobert Mustacchifunctions return the number of bytes stored in
1969a4a12bdSRobert Mustacchi.Fa str .
1979a4a12bdSRobert MustacchiOtherwise,
1989a4a12bdSRobert Mustacchi.Sy (size_t)-1
1999a4a12bdSRobert Mustacchiis returned to indicate an encoding error and
2009a4a12bdSRobert Mustacchi.Va errno
2019a4a12bdSRobert Mustacchiis set.
2029a4a12bdSRobert Mustacchi.Sh EXAMPLES
2039a4a12bdSRobert Mustacchi.Sy Example 1
2049a4a12bdSRobert MustacchiConverting a UTF-32 character into a multi-byte character sequence.
2059a4a12bdSRobert Mustacchi.Bd -literal
2069a4a12bdSRobert Mustacchi#include <locale.h>
2079a4a12bdSRobert Mustacchi#include <stdlib.h>
2089a4a12bdSRobert Mustacchi#include <string.h>
2099a4a12bdSRobert Mustacchi#include <err.h>
2109a4a12bdSRobert Mustacchi#include <stdio.h>
2119a4a12bdSRobert Mustacchi#include <uchar.h>
2129a4a12bdSRobert Mustacchi
2139a4a12bdSRobert Mustacchiint
2149a4a12bdSRobert Mustacchimain(void)
2159a4a12bdSRobert Mustacchi{
2169a4a12bdSRobert Mustacchi        mbstate_t mbs;
2179a4a12bdSRobert Mustacchi        size_t ret;
2189a4a12bdSRobert Mustacchi        char buf[MB_CUR_MAX];
2199a4a12bdSRobert Mustacchi        char32_t val = 0x5149;
2209a4a12bdSRobert Mustacchi        const char *uchar_exp = "\exe5\ex85\ex89";
2219a4a12bdSRobert Mustacchi
2229a4a12bdSRobert Mustacchi        (void) memset(&mbs, 0, sizeof (mbs));
2239a4a12bdSRobert Mustacchi        (void) setlocale(LC_CTYPE, "en_US.UTF-8");
2249a4a12bdSRobert Mustacchi        ret = c32rtomb(buf, val, &mbs);
2259a4a12bdSRobert Mustacchi        if (ret != strlen(uchar_exp)) {
2269a4a12bdSRobert Mustacchi                errx(EXIT_FAILURE, "failed to convert string, got %zd",
2279a4a12bdSRobert Mustacchi                    ret);
2289a4a12bdSRobert Mustacchi        }
2299a4a12bdSRobert Mustacchi
2309a4a12bdSRobert Mustacchi        if (strncmp(buf, uchar_exp, ret) != 0) {
2319a4a12bdSRobert Mustacchi                errx(EXIT_FAILURE, "converted char32_t does not match "
2329a4a12bdSRobert Mustacchi                    "expected value");
2339a4a12bdSRobert Mustacchi        }
2349a4a12bdSRobert Mustacchi
2359a4a12bdSRobert Mustacchi        return (0);
2369a4a12bdSRobert Mustacchi}
2379a4a12bdSRobert Mustacchi.Ed
2389a4a12bdSRobert Mustacchi.Sh ERRORS
2399a4a12bdSRobert MustacchiThe
2409a4a12bdSRobert Mustacchi.Fn c16rtomb ,
2419a4a12bdSRobert Mustacchi.Fn c32rtomb ,
2429a4a12bdSRobert Mustacchi.Fn wcrtomb ,
2439a4a12bdSRobert Mustacchiand
2449a4a12bdSRobert Mustacchi.Fn wcrtomb_l
2459a4a12bdSRobert Mustacchifunctions will fail if:
2469a4a12bdSRobert Mustacchi.Bl -tag -width Er
2479a4a12bdSRobert Mustacchi.It Er EINVAL
2489a4a12bdSRobert MustacchiThe conversion state in
2499a4a12bdSRobert Mustacchi.Fa ps
2509a4a12bdSRobert Mustacchiis invalid.
2519a4a12bdSRobert Mustacchi.It Er EILSEQ
2529a4a12bdSRobert MustacchiAn invalid character sequence has been detected.
2539a4a12bdSRobert Mustacchi.El
2549a4a12bdSRobert Mustacchi.Sh MT-LEVEL
2559a4a12bdSRobert MustacchiThe
2569a4a12bdSRobert Mustacchi.Fn c16rtomb ,
2579a4a12bdSRobert Mustacchi.Fn c32rtomb ,
2589a4a12bdSRobert Mustacchi.Fn wcrtomb ,
2599a4a12bdSRobert Mustacchiand
2609a4a12bdSRobert Mustacchi.Fn wcrtomb_l
2619a4a12bdSRobert Mustacchifunctions are
2629a4a12bdSRobert Mustacchi.Sy MT-Safe
2639a4a12bdSRobert Mustacchias long as different
2649a4a12bdSRobert Mustacchi.Vt mbstate_t
2659a4a12bdSRobert Mustacchistructures are passed in
2669a4a12bdSRobert Mustacchi.Fa ps .
2679a4a12bdSRobert MustacchiIf
2689a4a12bdSRobert Mustacchi.Fa ps
2699a4a12bdSRobert Mustacchiis
2709a4a12bdSRobert Mustacchi.Dv NULL
2719a4a12bdSRobert Mustacchior different threads use the same value for
2729a4a12bdSRobert Mustacchi.Fa ps ,
2739a4a12bdSRobert Mustacchithen the functions are
2749a4a12bdSRobert Mustacchi.Sy Unsafe .
2759a4a12bdSRobert Mustacchi.Sh INTERFACE STABILITY
2769a4a12bdSRobert Mustacchi.Sy Committed
2779a4a12bdSRobert Mustacchi.Sh SEE ALSO
2789a4a12bdSRobert Mustacchi.Xr mbrtoc16 3C ,
2799a4a12bdSRobert Mustacchi.Xr mbrtoc32 3C ,
2809a4a12bdSRobert Mustacchi.Xr mbrtowc 3C ,
2819a4a12bdSRobert Mustacchi.Xr newlocale 3C ,
2829a4a12bdSRobert Mustacchi.Xr setlocale 3C ,
2839a4a12bdSRobert Mustacchi.Xr uselocale 3C ,
2849a4a12bdSRobert Mustacchi.Xr uchar.h 3HEAD ,
2859a4a12bdSRobert Mustacchi.Xr environ 5
286