19a4a12bdSRobert Mustacchi.\" 29a4a12bdSRobert Mustacchi.\" This file and its contents are supplied under the terms of the 39a4a12bdSRobert Mustacchi.\" Common Development and Distribution License ("CDDL"), version 1.0. 49a4a12bdSRobert Mustacchi.\" You may only use this file in accordance with the terms of version 59a4a12bdSRobert Mustacchi.\" 1.0 of the CDDL. 69a4a12bdSRobert Mustacchi.\" 79a4a12bdSRobert Mustacchi.\" A full copy of the text of the CDDL should have accompanied this 89a4a12bdSRobert Mustacchi.\" source. A copy of the CDDL is also available via the Internet at 99a4a12bdSRobert Mustacchi.\" http://www.illumos.org/license/CDDL. 109a4a12bdSRobert Mustacchi.\" 119a4a12bdSRobert Mustacchi.\" 129a4a12bdSRobert Mustacchi.\" Copyright 2020 Robert Mustacchi 139a4a12bdSRobert Mustacchi.\" 149a4a12bdSRobert Mustacchi.Dd April 23, 2020 159a4a12bdSRobert Mustacchi.Dt C16RTOMB 3C 169a4a12bdSRobert Mustacchi.Os 179a4a12bdSRobert Mustacchi.Sh NAME 189a4a12bdSRobert Mustacchi.Nm c16rtomb , 199a4a12bdSRobert Mustacchi.Nm c32rtomb , 209a4a12bdSRobert Mustacchi.Nm wcrtomb , 219a4a12bdSRobert Mustacchi.Nm wcrtomb_l 229a4a12bdSRobert Mustacchi.Nd convert wide-characters to character sequences 239a4a12bdSRobert Mustacchi.Sh SYNOPSIS 249a4a12bdSRobert Mustacchi.In uchar.h 259a4a12bdSRobert Mustacchi.Ft size_t 269a4a12bdSRobert Mustacchi.Fo c16rtomb 279a4a12bdSRobert Mustacchi.Fa "char *restrict str" 289a4a12bdSRobert Mustacchi.Fa "char16_t c16" 299a4a12bdSRobert Mustacchi.Fa "mbstate_t *restrict ps" 309a4a12bdSRobert Mustacchi.Fc 319a4a12bdSRobert Mustacchi.Ft size_t 329a4a12bdSRobert Mustacchi.Fo c32rtomb 339a4a12bdSRobert Mustacchi.Fa "char *restrict str" 349a4a12bdSRobert Mustacchi.Fa "char32_t c32" 359a4a12bdSRobert Mustacchi.Fa "mbstate_t *restrict ps" 369a4a12bdSRobert Mustacchi.Fc 379a4a12bdSRobert Mustacchi.In stdio.h 389a4a12bdSRobert Mustacchi.Ft size_t 399a4a12bdSRobert Mustacchi.Fo wcrtomb 409a4a12bdSRobert Mustacchi.Fa "char *restrict str" 419a4a12bdSRobert Mustacchi.Fa "wchar_t wc" 429a4a12bdSRobert Mustacchi.Fa "mbstate_t *restrict ps" 439a4a12bdSRobert Mustacchi.Fc 449a4a12bdSRobert Mustacchi.In stdio.h 459a4a12bdSRobert Mustacchi.In xlocale.h 469a4a12bdSRobert Mustacchi.Ft size_t 479a4a12bdSRobert Mustacchi.Fo wcrtomb_l 489a4a12bdSRobert Mustacchi.Fa "char *restrict str" 499a4a12bdSRobert Mustacchi.Fa "wchar_t wc" 509a4a12bdSRobert Mustacchi.Fa "mbstate_t *restrict ps" 519a4a12bdSRobert Mustacchi.Fa "locale_t loc" 529a4a12bdSRobert Mustacchi.Fc 539a4a12bdSRobert Mustacchi.Sh DESCRIPTION 549a4a12bdSRobert MustacchiThe 559a4a12bdSRobert Mustacchi.Fn c16rtomb , 569a4a12bdSRobert Mustacchi.Fn c32rtomb , 579a4a12bdSRobert Mustacchi.Fn wcrtomb , 589a4a12bdSRobert Mustacchiand 599a4a12bdSRobert Mustacchi.Fn wcrtomb_l 609a4a12bdSRobert Mustacchifunctions convert wide-character sequences into a series of multi-byte 619a4a12bdSRobert Mustacchicharacters. 629a4a12bdSRobert MustacchiThe functions work in the following formats: 639a4a12bdSRobert Mustacchi.Bl -tag -width wcrtomb_l 649a4a12bdSRobert Mustacchi.It Fn c16rtomb 659a4a12bdSRobert MustacchiA UTF-16 code sequence, where every code point is represented by one or 669a4a12bdSRobert Mustacchitwo 679a4a12bdSRobert Mustacchi.Vt char16_t . 689a4a12bdSRobert MustacchiThe UTF-16 encoding will encode certain Unicode code points as a pair of 699a4a12bdSRobert Mustacchitwo 16-bit code sequences, commonly referred to as a surrogate pair. 709a4a12bdSRobert Mustacchi.It Fn c32rtomb 719a4a12bdSRobert MustacchiA UTF-32 code sequence, where every code point is represented by a 729a4a12bdSRobert Mustacchisingle 739a4a12bdSRobert Mustacchi.Vt char32_t . 749a4a12bdSRobert MustacchiIt is illegal to pass reserved Unicode code points. 759a4a12bdSRobert Mustacchi.It Fn wcrtomb , Fn wcrtomb_l 769a4a12bdSRobert MustacchiWide characters, being a 32-bit value where every code point is 779a4a12bdSRobert Mustacchirepresented by a single 789a4a12bdSRobert Mustacchi.Vt wchar_t . 799a4a12bdSRobert MustacchiWhile the 809a4a12bdSRobert Mustacchi.Vt wchar_t 819a4a12bdSRobert Mustacchiand 829a4a12bdSRobert Mustacchi.Vt char32_t 839a4a12bdSRobert Mustacchiare different types, in this implementation, they are similar encodings. 849a4a12bdSRobert Mustacchi.El 859a4a12bdSRobert Mustacchi.Pp 869a4a12bdSRobert MustacchiThe functions all work by looking at the passed in wide-character 879a4a12bdSRobert Mustacchi.Po 889a4a12bdSRobert Mustacchi.Fa c16 , 899a4a12bdSRobert Mustacchi.Fa c32 , 909a4a12bdSRobert Mustacchi.Fa wc 919a4a12bdSRobert Mustacchi.Pc 929a4a12bdSRobert Mustacchiand appending it to the current conversion state, 939a4a12bdSRobert Mustacchi.Fa ps . 949a4a12bdSRobert MustacchiOnce a valid code point, based on the current locale, is found, then it 959a4a12bdSRobert Mustacchiwill be converted into a series of characters that are stored in 969a4a12bdSRobert Mustacchi.Fa str . 979a4a12bdSRobert MustacchiUp to 989a4a12bdSRobert Mustacchi.Dv MB_CUR_MAX 999a4a12bdSRobert Mustacchibytes will be stored in 1009a4a12bdSRobert Mustacchi.Fa str . 1019a4a12bdSRobert MustacchiIt is the caller's responsibility to ensure that there is sufficient 1029a4a12bdSRobert Mustacchispace in 1039a4a12bdSRobert Mustacchi.Fa str . 1049a4a12bdSRobert Mustacchi.Pp 1059a4a12bdSRobert MustacchiThe functions are all influenced by the 1069a4a12bdSRobert Mustacchi.Dv LC_CTYPE 1079a4a12bdSRobert Mustacchicategory of the current locale for determining what is considered a 1089a4a12bdSRobert Mustacchivalid character. 1099a4a12bdSRobert MustacchiFor example, in the 1109a4a12bdSRobert Mustacchi.Sy C 1119a4a12bdSRobert Mustacchilocale, 1129a4a12bdSRobert Mustacchionly ASCII characters are recognized, while in a 1139a4a12bdSRobert Mustacchi.Sy UTF-8 1149a4a12bdSRobert Mustacchibased locale like 1159a4a12bdSRobert Mustacchi.Sy en_us.UTF-8 , 1169a4a12bdSRobert Mustacchiall valid Unicode code points are recognized and will be converted into 1179a4a12bdSRobert Mustacchithe corresponding multi-byte sequence. 1189a4a12bdSRobert MustacchiThe 1199a4a12bdSRobert Mustacchi.Fn wcrtomb_l 1209a4a12bdSRobert Mustacchifunction uses the locale passed in 1219a4a12bdSRobert Mustacchi.Fa loc 1229a4a12bdSRobert Mustacchirather than the locale of the current thread. 1239a4a12bdSRobert Mustacchi.Pp 1249a4a12bdSRobert MustacchiThe 1259a4a12bdSRobert Mustacchi.Fa ps 1269a4a12bdSRobert Mustacchiargument represents a multi-byte conversion state which can be used 1279a4a12bdSRobert Mustacchiacross multiple calls to a given function 1289a4a12bdSRobert Mustacchi.Pq but not mixed between functions . 1299a4a12bdSRobert MustacchiThese allow for characters to be consumed from subsequent buffers, e.g. 1309a4a12bdSRobert Mustacchidifferent values of 1319a4a12bdSRobert Mustacchi.Fa str . 1329a4a12bdSRobert MustacchiThe functions may be called from multiple threads as long as they use 1339a4a12bdSRobert Mustacchiunique values for 1349a4a12bdSRobert Mustacchi.Fa ps . 1359a4a12bdSRobert MustacchiIf 1369a4a12bdSRobert Mustacchi.Fa ps 1379a4a12bdSRobert Mustacchiis 1389a4a12bdSRobert Mustacchi.Dv NULL , 1399a4a12bdSRobert Mustacchithen a function-specific buffer will be used for the conversion state; 1409a4a12bdSRobert Mustacchihowever, this is stored between all threads and its use is not 1419a4a12bdSRobert Mustacchirecommended. 1429a4a12bdSRobert Mustacchi.Pp 1439a4a12bdSRobert MustacchiThe functions all have a special behavior when 1449a4a12bdSRobert Mustacchi.Dv NULL 1459a4a12bdSRobert Mustacchiis passed for 1469a4a12bdSRobert Mustacchi.Fa str . 1479a4a12bdSRobert MustacchiThey instead will treat it as though a the NULL wide-character was 1489a4a12bdSRobert Mustacchipassed in 1499a4a12bdSRobert Mustacchi.Fa c16 , 1509a4a12bdSRobert Mustacchi.Fa c32 , 1519a4a12bdSRobert Mustacchior 1529a4a12bdSRobert Mustacchi.Fa wc 1539a4a12bdSRobert Mustacchiand an internal buffer 1549a4a12bdSRobert Mustacchi.Pq buf 1559a4a12bdSRobert Mustacchiwill be used to write out the results of the 1569a4a12bdSRobert Mustacchiconverstion. 1579a4a12bdSRobert MustacchiIn other words, the functions would be called as: 1589a4a12bdSRobert Mustacchi.Bd -literal -offset indent 1599a4a12bdSRobert Mustacchic16rtomb(buf, L'\\0', ps) 1609a4a12bdSRobert Mustacchic32rtomb(buf, L'\\0', ps) 1619a4a12bdSRobert Mustacchiwcrtomb(buf, L'\\0', ps) 1629a4a12bdSRobert Mustacchiwcrtomb_l(buf, L'\\0', ps, loc) 1639a4a12bdSRobert Mustacchi.Ed 1649a4a12bdSRobert Mustacchi.Ss Locale Details 1659a4a12bdSRobert MustacchiNot all locales in the system are Unicode based locales. 1669a4a12bdSRobert MustacchiFor example, ISO 8859 family locales have code points with values that 1679a4a12bdSRobert Mustacchido not match their counterparts in Unicode. 1689a4a12bdSRobert MustacchiWhen using these functions with non-Unicode based locales, the code 1699a4a12bdSRobert Mustacchipoints returned will be those determined by the locale. 1709a4a12bdSRobert MustacchiThey will not be converted from the corresponding Unicode code point. 1719a4a12bdSRobert MustacchiFor example, if using the Euro sign in ISO 8859-15, these functions 1729a4a12bdSRobert Mustacchiwill not encode the Unicode value 0x20ac into the ISO 8859-15 value 1739a4a12bdSRobert Mustacchi0xa4. 1749a4a12bdSRobert Mustacchi.Pp 1759a4a12bdSRobert MustacchiRegardless of the locale, the characters returned will be encoded as 1769a4a12bdSRobert Mustacchithough the code point were the corresponding value in Unicode. 1779a4a12bdSRobert MustacchiThis means that when using UTF-16, if the corresponding code point were 1789a4a12bdSRobert Mustacchiin the range for surorgate pairs, then the 1799a4a12bdSRobert Mustacchi.Fn c16rtomb 1809a4a12bdSRobert Mustacchifunction will expect to receive that code point in that fashion. 1819a4a12bdSRobert Mustacchi.Pp 1829a4a12bdSRobert MustacchiThis behavior of the 1839a4a12bdSRobert Mustacchi.Fn c16rtomb 1849a4a12bdSRobert Mustacchiand 1859a4a12bdSRobert Mustacchi.Fn c32rtomb 1869a4a12bdSRobert Mustacchifunctions should not be relied upon, is not portable, and subject to 1879a4a12bdSRobert Mustacchichange for non-Unicode locales. 1889a4a12bdSRobert Mustacchi.Sh RETURN VALUES 1899a4a12bdSRobert MustacchiUpon successful completion, the 1909a4a12bdSRobert Mustacchi.Fn c16rtomb , 1919a4a12bdSRobert Mustacchi.Fn c32rtomb , 1929a4a12bdSRobert Mustacchi.Fn wcrtomb , 1939a4a12bdSRobert Mustacchiand 1949a4a12bdSRobert Mustacchi.Fn wcrtomb_l 1959a4a12bdSRobert Mustacchifunctions return the number of bytes stored in 1969a4a12bdSRobert Mustacchi.Fa str . 1979a4a12bdSRobert MustacchiOtherwise, 1989a4a12bdSRobert Mustacchi.Sy (size_t)-1 1999a4a12bdSRobert Mustacchiis returned to indicate an encoding error and 2009a4a12bdSRobert Mustacchi.Va errno 2019a4a12bdSRobert Mustacchiis set. 2029a4a12bdSRobert Mustacchi.Sh EXAMPLES 2039a4a12bdSRobert Mustacchi.Sy Example 1 2049a4a12bdSRobert MustacchiConverting a UTF-32 character into a multi-byte character sequence. 2059a4a12bdSRobert Mustacchi.Bd -literal 2069a4a12bdSRobert Mustacchi#include <locale.h> 2079a4a12bdSRobert Mustacchi#include <stdlib.h> 2089a4a12bdSRobert Mustacchi#include <string.h> 2099a4a12bdSRobert Mustacchi#include <err.h> 2109a4a12bdSRobert Mustacchi#include <stdio.h> 2119a4a12bdSRobert Mustacchi#include <uchar.h> 2129a4a12bdSRobert Mustacchi 2139a4a12bdSRobert Mustacchiint 2149a4a12bdSRobert Mustacchimain(void) 2159a4a12bdSRobert Mustacchi{ 2169a4a12bdSRobert Mustacchi mbstate_t mbs; 2179a4a12bdSRobert Mustacchi size_t ret; 2189a4a12bdSRobert Mustacchi char buf[MB_CUR_MAX]; 2199a4a12bdSRobert Mustacchi char32_t val = 0x5149; 2209a4a12bdSRobert Mustacchi const char *uchar_exp = "\exe5\ex85\ex89"; 2219a4a12bdSRobert Mustacchi 2229a4a12bdSRobert Mustacchi (void) memset(&mbs, 0, sizeof (mbs)); 2239a4a12bdSRobert Mustacchi (void) setlocale(LC_CTYPE, "en_US.UTF-8"); 2249a4a12bdSRobert Mustacchi ret = c32rtomb(buf, val, &mbs); 2259a4a12bdSRobert Mustacchi if (ret != strlen(uchar_exp)) { 2269a4a12bdSRobert Mustacchi errx(EXIT_FAILURE, "failed to convert string, got %zd", 2279a4a12bdSRobert Mustacchi ret); 2289a4a12bdSRobert Mustacchi } 2299a4a12bdSRobert Mustacchi 2309a4a12bdSRobert Mustacchi if (strncmp(buf, uchar_exp, ret) != 0) { 2319a4a12bdSRobert Mustacchi errx(EXIT_FAILURE, "converted char32_t does not match " 2329a4a12bdSRobert Mustacchi "expected value"); 2339a4a12bdSRobert Mustacchi } 2349a4a12bdSRobert Mustacchi 2359a4a12bdSRobert Mustacchi return (0); 2369a4a12bdSRobert Mustacchi} 2379a4a12bdSRobert Mustacchi.Ed 2389a4a12bdSRobert Mustacchi.Sh ERRORS 2399a4a12bdSRobert MustacchiThe 2409a4a12bdSRobert Mustacchi.Fn c16rtomb , 2419a4a12bdSRobert Mustacchi.Fn c32rtomb , 2429a4a12bdSRobert Mustacchi.Fn wcrtomb , 2439a4a12bdSRobert Mustacchiand 2449a4a12bdSRobert Mustacchi.Fn wcrtomb_l 2459a4a12bdSRobert Mustacchifunctions will fail if: 2469a4a12bdSRobert Mustacchi.Bl -tag -width Er 2479a4a12bdSRobert Mustacchi.It Er EINVAL 2489a4a12bdSRobert MustacchiThe conversion state in 2499a4a12bdSRobert Mustacchi.Fa ps 2509a4a12bdSRobert Mustacchiis invalid. 2519a4a12bdSRobert Mustacchi.It Er EILSEQ 2529a4a12bdSRobert MustacchiAn invalid character sequence has been detected. 2539a4a12bdSRobert Mustacchi.El 2549a4a12bdSRobert Mustacchi.Sh MT-LEVEL 2559a4a12bdSRobert MustacchiThe 2569a4a12bdSRobert Mustacchi.Fn c16rtomb , 2579a4a12bdSRobert Mustacchi.Fn c32rtomb , 2589a4a12bdSRobert Mustacchi.Fn wcrtomb , 2599a4a12bdSRobert Mustacchiand 2609a4a12bdSRobert Mustacchi.Fn wcrtomb_l 2619a4a12bdSRobert Mustacchifunctions are 2629a4a12bdSRobert Mustacchi.Sy MT-Safe 2639a4a12bdSRobert Mustacchias long as different 2649a4a12bdSRobert Mustacchi.Vt mbstate_t 2659a4a12bdSRobert Mustacchistructures are passed in 2669a4a12bdSRobert Mustacchi.Fa ps . 2679a4a12bdSRobert MustacchiIf 2689a4a12bdSRobert Mustacchi.Fa ps 2699a4a12bdSRobert Mustacchiis 2709a4a12bdSRobert Mustacchi.Dv NULL 2719a4a12bdSRobert Mustacchior different threads use the same value for 2729a4a12bdSRobert Mustacchi.Fa ps , 2739a4a12bdSRobert Mustacchithen the functions are 2749a4a12bdSRobert Mustacchi.Sy Unsafe . 2759a4a12bdSRobert Mustacchi.Sh INTERFACE STABILITY 2769a4a12bdSRobert Mustacchi.Sy Committed 2779a4a12bdSRobert Mustacchi.Sh SEE ALSO 2789a4a12bdSRobert Mustacchi.Xr mbrtoc16 3C , 2799a4a12bdSRobert Mustacchi.Xr mbrtoc32 3C , 2809a4a12bdSRobert Mustacchi.Xr mbrtowc 3C , 2819a4a12bdSRobert Mustacchi.Xr newlocale 3C , 2829a4a12bdSRobert Mustacchi.Xr setlocale 3C , 2839a4a12bdSRobert Mustacchi.Xr uselocale 3C , 2849a4a12bdSRobert Mustacchi.Xr uchar.h 3HEAD , 2859a4a12bdSRobert Mustacchi.Xr environ 5 286