1*eda3ef2dSRobert Mustacchi.\" 2*eda3ef2dSRobert Mustacchi.\" This file and its contents are supplied under the terms of the 3*eda3ef2dSRobert Mustacchi.\" Common Development and Distribution License ("CDDL"), version 1.0. 4*eda3ef2dSRobert Mustacchi.\" You may only use this file in accordance with the terms of version 5*eda3ef2dSRobert Mustacchi.\" 1.0 of the CDDL. 6*eda3ef2dSRobert Mustacchi.\" 7*eda3ef2dSRobert Mustacchi.\" A full copy of the text of the CDDL should have accompanied this 8*eda3ef2dSRobert Mustacchi.\" source. A copy of the CDDL is also available via the Internet at 9*eda3ef2dSRobert Mustacchi.\" http://www.illumos.org/license/CDDL. 10*eda3ef2dSRobert Mustacchi.\" 11*eda3ef2dSRobert Mustacchi.\" 12*eda3ef2dSRobert Mustacchi.\" Copyright 2020 Robert Mustacchi 13*eda3ef2dSRobert Mustacchi.\" 14*eda3ef2dSRobert Mustacchi.Dd April 23, 2020 15*eda3ef2dSRobert Mustacchi.Dt C16RTOMB 3C 16*eda3ef2dSRobert Mustacchi.Os 17*eda3ef2dSRobert Mustacchi.Sh NAME 18*eda3ef2dSRobert Mustacchi.Nm c16rtomb , 19*eda3ef2dSRobert Mustacchi.Nm c32rtomb , 20*eda3ef2dSRobert Mustacchi.Nm wcrtomb , 21*eda3ef2dSRobert Mustacchi.Nm wcrtomb_l 22*eda3ef2dSRobert Mustacchi.Nd convert wide-characters to character sequences 23*eda3ef2dSRobert Mustacchi.Sh SYNOPSIS 24*eda3ef2dSRobert Mustacchi.In uchar.h 25*eda3ef2dSRobert Mustacchi.Ft size_t 26*eda3ef2dSRobert Mustacchi.Fo c16rtomb 27*eda3ef2dSRobert Mustacchi.Fa "char *restrict str" 28*eda3ef2dSRobert Mustacchi.Fa "char16_t c16" 29*eda3ef2dSRobert Mustacchi.Fa "mbstate_t *restrict ps" 30*eda3ef2dSRobert Mustacchi.Fc 31*eda3ef2dSRobert Mustacchi.Ft size_t 32*eda3ef2dSRobert Mustacchi.Fo c32rtomb 33*eda3ef2dSRobert Mustacchi.Fa "char *restrict str" 34*eda3ef2dSRobert Mustacchi.Fa "char32_t c32" 35*eda3ef2dSRobert Mustacchi.Fa "mbstate_t *restrict ps" 36*eda3ef2dSRobert Mustacchi.Fc 37*eda3ef2dSRobert Mustacchi.In stdio.h 38*eda3ef2dSRobert Mustacchi.Ft size_t 39*eda3ef2dSRobert Mustacchi.Fo wcrtomb 40*eda3ef2dSRobert Mustacchi.Fa "char *restrict str" 41*eda3ef2dSRobert Mustacchi.Fa "wchar_t wc" 42*eda3ef2dSRobert Mustacchi.Fa "mbstate_t *restrict ps" 43*eda3ef2dSRobert Mustacchi.Fc 44*eda3ef2dSRobert Mustacchi.In stdio.h 45*eda3ef2dSRobert Mustacchi.In xlocale.h 46*eda3ef2dSRobert Mustacchi.Ft size_t 47*eda3ef2dSRobert Mustacchi.Fo wcrtomb_l 48*eda3ef2dSRobert Mustacchi.Fa "char *restrict str" 49*eda3ef2dSRobert Mustacchi.Fa "wchar_t wc" 50*eda3ef2dSRobert Mustacchi.Fa "mbstate_t *restrict ps" 51*eda3ef2dSRobert Mustacchi.Fa "locale_t loc" 52*eda3ef2dSRobert Mustacchi.Fc 53*eda3ef2dSRobert Mustacchi.Sh DESCRIPTION 54*eda3ef2dSRobert MustacchiThe 55*eda3ef2dSRobert Mustacchi.Fn c16rtomb , 56*eda3ef2dSRobert Mustacchi.Fn c32rtomb , 57*eda3ef2dSRobert Mustacchi.Fn wcrtomb , 58*eda3ef2dSRobert Mustacchiand 59*eda3ef2dSRobert Mustacchi.Fn wcrtomb_l 60*eda3ef2dSRobert Mustacchifunctions convert wide-character sequences into a series of multi-byte 61*eda3ef2dSRobert Mustacchicharacters. 62*eda3ef2dSRobert MustacchiThe functions work in the following formats: 63*eda3ef2dSRobert Mustacchi.Bl -tag -width wcrtomb_l 64*eda3ef2dSRobert Mustacchi.It Fn c16rtomb 65*eda3ef2dSRobert MustacchiA UTF-16 code sequence, where every code point is represented by one or 66*eda3ef2dSRobert Mustacchitwo 67*eda3ef2dSRobert Mustacchi.Vt char16_t . 68*eda3ef2dSRobert MustacchiThe UTF-16 encoding will encode certain Unicode code points as a pair of 69*eda3ef2dSRobert Mustacchitwo 16-bit code sequences, commonly referred to as a surrogate pair. 70*eda3ef2dSRobert Mustacchi.It Fn c32rtomb 71*eda3ef2dSRobert MustacchiA UTF-32 code sequence, where every code point is represented by a 72*eda3ef2dSRobert Mustacchisingle 73*eda3ef2dSRobert Mustacchi.Vt char32_t . 74*eda3ef2dSRobert MustacchiIt is illegal to pass reserved Unicode code points. 75*eda3ef2dSRobert Mustacchi.It Fn wcrtomb , Fn wcrtomb_l 76*eda3ef2dSRobert MustacchiWide characters, being a 32-bit value where every code point is 77*eda3ef2dSRobert Mustacchirepresented by a single 78*eda3ef2dSRobert Mustacchi.Vt wchar_t . 79*eda3ef2dSRobert MustacchiWhile the 80*eda3ef2dSRobert Mustacchi.Vt wchar_t 81*eda3ef2dSRobert Mustacchiand 82*eda3ef2dSRobert Mustacchi.Vt char32_t 83*eda3ef2dSRobert Mustacchiare different types, in this implementation, they are similar encodings. 84*eda3ef2dSRobert Mustacchi.El 85*eda3ef2dSRobert Mustacchi.Pp 86*eda3ef2dSRobert MustacchiThe functions all work by looking at the passed in wide-character 87*eda3ef2dSRobert Mustacchi.Po 88*eda3ef2dSRobert Mustacchi.Fa c16 , 89*eda3ef2dSRobert Mustacchi.Fa c32 , 90*eda3ef2dSRobert Mustacchi.Fa wc 91*eda3ef2dSRobert Mustacchi.Pc 92*eda3ef2dSRobert Mustacchiand appending it to the current conversion state, 93*eda3ef2dSRobert Mustacchi.Fa ps . 94*eda3ef2dSRobert MustacchiOnce a valid code point, based on the current locale, is found, then it 95*eda3ef2dSRobert Mustacchiwill be converted into a series of characters that are stored in 96*eda3ef2dSRobert Mustacchi.Fa str . 97*eda3ef2dSRobert MustacchiUp to 98*eda3ef2dSRobert Mustacchi.Dv MB_CUR_MAX 99*eda3ef2dSRobert Mustacchibytes will be stored in 100*eda3ef2dSRobert Mustacchi.Fa str . 101*eda3ef2dSRobert MustacchiIt is the caller's responsibility to ensure that there is sufficient 102*eda3ef2dSRobert Mustacchispace in 103*eda3ef2dSRobert Mustacchi.Fa str . 104*eda3ef2dSRobert Mustacchi.Pp 105*eda3ef2dSRobert MustacchiThe functions are all influenced by the 106*eda3ef2dSRobert Mustacchi.Dv LC_CTYPE 107*eda3ef2dSRobert Mustacchicategory of the current locale for determining what is considered a 108*eda3ef2dSRobert Mustacchivalid character. 109*eda3ef2dSRobert MustacchiFor example, in the 110*eda3ef2dSRobert Mustacchi.Sy C 111*eda3ef2dSRobert Mustacchilocale, 112*eda3ef2dSRobert Mustacchionly ASCII characters are recognized, while in a 113*eda3ef2dSRobert Mustacchi.Sy UTF-8 114*eda3ef2dSRobert Mustacchibased locale like 115*eda3ef2dSRobert Mustacchi.Sy en_us.UTF-8 , 116*eda3ef2dSRobert Mustacchiall valid Unicode code points are recognized and will be converted into 117*eda3ef2dSRobert Mustacchithe corresponding multi-byte sequence. 118*eda3ef2dSRobert MustacchiThe 119*eda3ef2dSRobert Mustacchi.Fn wcrtomb_l 120*eda3ef2dSRobert Mustacchifunction uses the locale passed in 121*eda3ef2dSRobert Mustacchi.Fa loc 122*eda3ef2dSRobert Mustacchirather than the locale of the current thread. 123*eda3ef2dSRobert Mustacchi.Pp 124*eda3ef2dSRobert MustacchiThe 125*eda3ef2dSRobert Mustacchi.Fa ps 126*eda3ef2dSRobert Mustacchiargument represents a multi-byte conversion state which can be used 127*eda3ef2dSRobert Mustacchiacross multiple calls to a given function 128*eda3ef2dSRobert Mustacchi.Pq but not mixed between functions . 129*eda3ef2dSRobert MustacchiThese allow for characters to be consumed from subsequent buffers, e.g. 130*eda3ef2dSRobert Mustacchidifferent values of 131*eda3ef2dSRobert Mustacchi.Fa str . 132*eda3ef2dSRobert MustacchiThe functions may be called from multiple threads as long as they use 133*eda3ef2dSRobert Mustacchiunique values for 134*eda3ef2dSRobert Mustacchi.Fa ps . 135*eda3ef2dSRobert MustacchiIf 136*eda3ef2dSRobert Mustacchi.Fa ps 137*eda3ef2dSRobert Mustacchiis 138*eda3ef2dSRobert Mustacchi.Dv NULL , 139*eda3ef2dSRobert Mustacchithen a function-specific buffer will be used for the conversion state; 140*eda3ef2dSRobert Mustacchihowever, this is stored between all threads and its use is not 141*eda3ef2dSRobert Mustacchirecommended. 142*eda3ef2dSRobert Mustacchi.Pp 143*eda3ef2dSRobert MustacchiThe functions all have a special behavior when 144*eda3ef2dSRobert Mustacchi.Dv NULL 145*eda3ef2dSRobert Mustacchiis passed for 146*eda3ef2dSRobert Mustacchi.Fa str . 147*eda3ef2dSRobert MustacchiThey instead will treat it as though a the NULL wide-character was 148*eda3ef2dSRobert Mustacchipassed in 149*eda3ef2dSRobert Mustacchi.Fa c16 , 150*eda3ef2dSRobert Mustacchi.Fa c32 , 151*eda3ef2dSRobert Mustacchior 152*eda3ef2dSRobert Mustacchi.Fa wc 153*eda3ef2dSRobert Mustacchiand an internal buffer 154*eda3ef2dSRobert Mustacchi.Pq buf 155*eda3ef2dSRobert Mustacchiwill be used to write out the results of the 156*eda3ef2dSRobert Mustacchiconverstion. 157*eda3ef2dSRobert MustacchiIn other words, the functions would be called as: 158*eda3ef2dSRobert Mustacchi.Bd -literal -offset indent 159*eda3ef2dSRobert Mustacchic16rtomb(buf, L'\\0', ps) 160*eda3ef2dSRobert Mustacchic32rtomb(buf, L'\\0', ps) 161*eda3ef2dSRobert Mustacchiwcrtomb(buf, L'\\0', ps) 162*eda3ef2dSRobert Mustacchiwcrtomb_l(buf, L'\\0', ps, loc) 163*eda3ef2dSRobert Mustacchi.Ed 164*eda3ef2dSRobert Mustacchi.Ss Locale Details 165*eda3ef2dSRobert MustacchiNot all locales in the system are Unicode based locales. 166*eda3ef2dSRobert MustacchiFor example, ISO 8859 family locales have code points with values that 167*eda3ef2dSRobert Mustacchido not match their counterparts in Unicode. 168*eda3ef2dSRobert MustacchiWhen using these functions with non-Unicode based locales, the code 169*eda3ef2dSRobert Mustacchipoints returned will be those determined by the locale. 170*eda3ef2dSRobert MustacchiThey will not be converted from the corresponding Unicode code point. 171*eda3ef2dSRobert MustacchiFor example, if using the Euro sign in ISO 8859-15, these functions 172*eda3ef2dSRobert Mustacchiwill not encode the Unicode value 0x20ac into the ISO 8859-15 value 173*eda3ef2dSRobert Mustacchi0xa4. 174*eda3ef2dSRobert Mustacchi.Pp 175*eda3ef2dSRobert MustacchiRegardless of the locale, the characters returned will be encoded as 176*eda3ef2dSRobert Mustacchithough the code point were the corresponding value in Unicode. 177*eda3ef2dSRobert MustacchiThis means that when using UTF-16, if the corresponding code point were 178*eda3ef2dSRobert Mustacchiin the range for surorgate pairs, then the 179*eda3ef2dSRobert Mustacchi.Fn c16rtomb 180*eda3ef2dSRobert Mustacchifunction will expect to receive that code point in that fashion. 181*eda3ef2dSRobert Mustacchi.Pp 182*eda3ef2dSRobert MustacchiThis behavior of the 183*eda3ef2dSRobert Mustacchi.Fn c16rtomb 184*eda3ef2dSRobert Mustacchiand 185*eda3ef2dSRobert Mustacchi.Fn c32rtomb 186*eda3ef2dSRobert Mustacchifunctions should not be relied upon, is not portable, and subject to 187*eda3ef2dSRobert Mustacchichange for non-Unicode locales. 188*eda3ef2dSRobert Mustacchi.Sh RETURN VALUES 189*eda3ef2dSRobert MustacchiUpon successful completion, the 190*eda3ef2dSRobert Mustacchi.Fn c16rtomb , 191*eda3ef2dSRobert Mustacchi.Fn c32rtomb , 192*eda3ef2dSRobert Mustacchi.Fn wcrtomb , 193*eda3ef2dSRobert Mustacchiand 194*eda3ef2dSRobert Mustacchi.Fn wcrtomb_l 195*eda3ef2dSRobert Mustacchifunctions return the number of bytes stored in 196*eda3ef2dSRobert Mustacchi.Fa str . 197*eda3ef2dSRobert MustacchiOtherwise, 198*eda3ef2dSRobert Mustacchi.Sy (size_t)-1 199*eda3ef2dSRobert Mustacchiis returned to indicate an encoding error and 200*eda3ef2dSRobert Mustacchi.Va errno 201*eda3ef2dSRobert Mustacchiis set. 202*eda3ef2dSRobert Mustacchi.Sh EXAMPLES 203*eda3ef2dSRobert Mustacchi.Sy Example 1 204*eda3ef2dSRobert MustacchiConverting a UTF-32 character into a multi-byte character sequence. 205*eda3ef2dSRobert Mustacchi.Bd -literal 206*eda3ef2dSRobert Mustacchi#include <locale.h> 207*eda3ef2dSRobert Mustacchi#include <stdlib.h> 208*eda3ef2dSRobert Mustacchi#include <string.h> 209*eda3ef2dSRobert Mustacchi#include <err.h> 210*eda3ef2dSRobert Mustacchi#include <stdio.h> 211*eda3ef2dSRobert Mustacchi#include <uchar.h> 212*eda3ef2dSRobert Mustacchi 213*eda3ef2dSRobert Mustacchiint 214*eda3ef2dSRobert Mustacchimain(void) 215*eda3ef2dSRobert Mustacchi{ 216*eda3ef2dSRobert Mustacchi mbstate_t mbs; 217*eda3ef2dSRobert Mustacchi size_t ret; 218*eda3ef2dSRobert Mustacchi char buf[MB_CUR_MAX]; 219*eda3ef2dSRobert Mustacchi char32_t val = 0x5149; 220*eda3ef2dSRobert Mustacchi const char *uchar_exp = "\exe5\ex85\ex89"; 221*eda3ef2dSRobert Mustacchi 222*eda3ef2dSRobert Mustacchi (void) memset(&mbs, 0, sizeof (mbs)); 223*eda3ef2dSRobert Mustacchi (void) setlocale(LC_CTYPE, "en_US.UTF-8"); 224*eda3ef2dSRobert Mustacchi ret = c32rtomb(buf, val, &mbs); 225*eda3ef2dSRobert Mustacchi if (ret != strlen(uchar_exp)) { 226*eda3ef2dSRobert Mustacchi errx(EXIT_FAILURE, "failed to convert string, got %zd", 227*eda3ef2dSRobert Mustacchi ret); 228*eda3ef2dSRobert Mustacchi } 229*eda3ef2dSRobert Mustacchi 230*eda3ef2dSRobert Mustacchi if (strncmp(buf, uchar_exp, ret) != 0) { 231*eda3ef2dSRobert Mustacchi errx(EXIT_FAILURE, "converted char32_t does not match " 232*eda3ef2dSRobert Mustacchi "expected value"); 233*eda3ef2dSRobert Mustacchi } 234*eda3ef2dSRobert Mustacchi 235*eda3ef2dSRobert Mustacchi return (0); 236*eda3ef2dSRobert Mustacchi} 237*eda3ef2dSRobert Mustacchi.Ed 238*eda3ef2dSRobert Mustacchi.Sh ERRORS 239*eda3ef2dSRobert MustacchiThe 240*eda3ef2dSRobert Mustacchi.Fn c16rtomb , 241*eda3ef2dSRobert Mustacchi.Fn c32rtomb , 242*eda3ef2dSRobert Mustacchi.Fn wcrtomb , 243*eda3ef2dSRobert Mustacchiand 244*eda3ef2dSRobert Mustacchi.Fn wcrtomb_l 245*eda3ef2dSRobert Mustacchifunctions will fail if: 246*eda3ef2dSRobert Mustacchi.Bl -tag -width Er 247*eda3ef2dSRobert Mustacchi.It Er EINVAL 248*eda3ef2dSRobert MustacchiThe conversion state in 249*eda3ef2dSRobert Mustacchi.Fa ps 250*eda3ef2dSRobert Mustacchiis invalid. 251*eda3ef2dSRobert Mustacchi.It Er EILSEQ 252*eda3ef2dSRobert MustacchiAn invalid character sequence has been detected. 253*eda3ef2dSRobert Mustacchi.El 254*eda3ef2dSRobert Mustacchi.Sh MT-LEVEL 255*eda3ef2dSRobert MustacchiThe 256*eda3ef2dSRobert Mustacchi.Fn c16rtomb , 257*eda3ef2dSRobert Mustacchi.Fn c32rtomb , 258*eda3ef2dSRobert Mustacchi.Fn wcrtomb , 259*eda3ef2dSRobert Mustacchiand 260*eda3ef2dSRobert Mustacchi.Fn wcrtomb_l 261*eda3ef2dSRobert Mustacchifunctions are 262*eda3ef2dSRobert Mustacchi.Sy MT-Safe 263*eda3ef2dSRobert Mustacchias long as different 264*eda3ef2dSRobert Mustacchi.Vt mbstate_t 265*eda3ef2dSRobert Mustacchistructures are passed in 266*eda3ef2dSRobert Mustacchi.Fa ps . 267*eda3ef2dSRobert MustacchiIf 268*eda3ef2dSRobert Mustacchi.Fa ps 269*eda3ef2dSRobert Mustacchiis 270*eda3ef2dSRobert Mustacchi.Dv NULL 271*eda3ef2dSRobert Mustacchior different threads use the same value for 272*eda3ef2dSRobert Mustacchi.Fa ps , 273*eda3ef2dSRobert Mustacchithen the functions are 274*eda3ef2dSRobert Mustacchi.Sy Unsafe . 275*eda3ef2dSRobert Mustacchi.Sh INTERFACE STABILITY 276*eda3ef2dSRobert Mustacchi.Sy Committed 277*eda3ef2dSRobert Mustacchi.Sh SEE ALSO 278*eda3ef2dSRobert Mustacchi.Xr mbrtoc16 3C , 279*eda3ef2dSRobert Mustacchi.Xr mbrtoc32 3C , 280*eda3ef2dSRobert Mustacchi.Xr mbrtowc 3C , 281*eda3ef2dSRobert Mustacchi.Xr newlocale 3C , 282*eda3ef2dSRobert Mustacchi.Xr setlocale 3C , 283*eda3ef2dSRobert Mustacchi.Xr uselocale 3C , 284*eda3ef2dSRobert Mustacchi.Xr uchar.h 3HEAD , 285*eda3ef2dSRobert Mustacchi.Xr environ 5 286