1*3fc10f8cSRobert Mustacchi.\" 2*3fc10f8cSRobert Mustacchi.\" This file and its contents are supplied under the terms of the 3*3fc10f8cSRobert Mustacchi.\" Common Development and Distribution License ("CDDL"), version 1.0. 4*3fc10f8cSRobert Mustacchi.\" You may only use this file in accordance with the terms of version 5*3fc10f8cSRobert Mustacchi.\" 1.0 of the CDDL. 6*3fc10f8cSRobert Mustacchi.\" 7*3fc10f8cSRobert Mustacchi.\" A full copy of the text of the CDDL should have accompanied this 8*3fc10f8cSRobert Mustacchi.\" source. A copy of the CDDL is also available via the Internet at 9*3fc10f8cSRobert Mustacchi.\" http://www.illumos.org/license/CDDL. 10*3fc10f8cSRobert Mustacchi.\" 11*3fc10f8cSRobert Mustacchi.\" 12*3fc10f8cSRobert Mustacchi.\" Copyright 2020 Robert Mustacchi 13*3fc10f8cSRobert Mustacchi.\" 14*3fc10f8cSRobert Mustacchi.Dd April 23, 2020 15*3fc10f8cSRobert Mustacchi.Dt C16RTOMB 3C 16*3fc10f8cSRobert Mustacchi.Os 17*3fc10f8cSRobert Mustacchi.Sh NAME 18*3fc10f8cSRobert Mustacchi.Nm c16rtomb , 19*3fc10f8cSRobert Mustacchi.Nm c32rtomb , 20*3fc10f8cSRobert Mustacchi.Nm wcrtomb , 21*3fc10f8cSRobert Mustacchi.Nm wcrtomb_l 22*3fc10f8cSRobert Mustacchi.Nd convert wide-characters to character sequences 23*3fc10f8cSRobert Mustacchi.Sh SYNOPSIS 24*3fc10f8cSRobert Mustacchi.In uchar.h 25*3fc10f8cSRobert Mustacchi.Ft size_t 26*3fc10f8cSRobert Mustacchi.Fo c16rtomb 27*3fc10f8cSRobert Mustacchi.Fa "char *restrict str" 28*3fc10f8cSRobert Mustacchi.Fa "char16_t c16" 29*3fc10f8cSRobert Mustacchi.Fa "mbstate_t *restrict ps" 30*3fc10f8cSRobert Mustacchi.Fc 31*3fc10f8cSRobert Mustacchi.Ft size_t 32*3fc10f8cSRobert Mustacchi.Fo c32rtomb 33*3fc10f8cSRobert Mustacchi.Fa "char *restrict str" 34*3fc10f8cSRobert Mustacchi.Fa "char32_t c32" 35*3fc10f8cSRobert Mustacchi.Fa "mbstate_t *restrict ps" 36*3fc10f8cSRobert Mustacchi.Fc 37*3fc10f8cSRobert Mustacchi.In stdio.h 38*3fc10f8cSRobert Mustacchi.Ft size_t 39*3fc10f8cSRobert Mustacchi.Fo wcrtomb 40*3fc10f8cSRobert Mustacchi.Fa "char *restrict str" 41*3fc10f8cSRobert Mustacchi.Fa "wchar_t wc" 42*3fc10f8cSRobert Mustacchi.Fa "mbstate_t *restrict ps" 43*3fc10f8cSRobert Mustacchi.Fc 44*3fc10f8cSRobert Mustacchi.In stdio.h 45*3fc10f8cSRobert Mustacchi.In xlocale.h 46*3fc10f8cSRobert Mustacchi.Ft size_t 47*3fc10f8cSRobert Mustacchi.Fo wcrtomb_l 48*3fc10f8cSRobert Mustacchi.Fa "char *restrict str" 49*3fc10f8cSRobert Mustacchi.Fa "wchar_t wc" 50*3fc10f8cSRobert Mustacchi.Fa "mbstate_t *restrict ps" 51*3fc10f8cSRobert Mustacchi.Fa "locale_t loc" 52*3fc10f8cSRobert Mustacchi.Fc 53*3fc10f8cSRobert Mustacchi.Sh DESCRIPTION 54*3fc10f8cSRobert MustacchiThe 55*3fc10f8cSRobert Mustacchi.Fn c16rtomb , 56*3fc10f8cSRobert Mustacchi.Fn c32rtomb , 57*3fc10f8cSRobert Mustacchi.Fn wcrtomb , 58*3fc10f8cSRobert Mustacchiand 59*3fc10f8cSRobert Mustacchi.Fn wcrtomb_l 60*3fc10f8cSRobert Mustacchifunctions convert wide-character sequences into a series of multi-byte 61*3fc10f8cSRobert Mustacchicharacters. 62*3fc10f8cSRobert MustacchiThe functions work in the following formats: 63*3fc10f8cSRobert Mustacchi.Bl -tag -width wcrtomb_l 64*3fc10f8cSRobert Mustacchi.It Fn c16rtomb 65*3fc10f8cSRobert MustacchiA UTF-16 code sequence, where every code point is represented by one or 66*3fc10f8cSRobert Mustacchitwo 67*3fc10f8cSRobert Mustacchi.Vt char16_t . 68*3fc10f8cSRobert MustacchiThe UTF-16 encoding will encode certain Unicode code points as a pair of 69*3fc10f8cSRobert Mustacchitwo 16-bit code sequences, commonly referred to as a surrogate pair. 70*3fc10f8cSRobert Mustacchi.It Fn c32rtomb 71*3fc10f8cSRobert MustacchiA UTF-32 code sequence, where every code point is represented by a 72*3fc10f8cSRobert Mustacchisingle 73*3fc10f8cSRobert Mustacchi.Vt char32_t . 74*3fc10f8cSRobert MustacchiIt is illegal to pass reserved Unicode code points. 75*3fc10f8cSRobert Mustacchi.It Fn wcrtomb , Fn wcrtomb_l 76*3fc10f8cSRobert MustacchiWide characters, being a 32-bit value where every code point is 77*3fc10f8cSRobert Mustacchirepresented by a single 78*3fc10f8cSRobert Mustacchi.Vt wchar_t . 79*3fc10f8cSRobert MustacchiWhile the 80*3fc10f8cSRobert Mustacchi.Vt wchar_t 81*3fc10f8cSRobert Mustacchiand 82*3fc10f8cSRobert Mustacchi.Vt char32_t 83*3fc10f8cSRobert Mustacchiare different types, in this implementation, they are similar encodings. 84*3fc10f8cSRobert Mustacchi.El 85*3fc10f8cSRobert Mustacchi.Pp 86*3fc10f8cSRobert MustacchiThe functions all work by looking at the passed in wide-character 87*3fc10f8cSRobert Mustacchi.Po 88*3fc10f8cSRobert Mustacchi.Fa c16 , 89*3fc10f8cSRobert Mustacchi.Fa c32 , 90*3fc10f8cSRobert Mustacchi.Fa wc 91*3fc10f8cSRobert Mustacchi.Pc 92*3fc10f8cSRobert Mustacchiand appending it to the current conversion state, 93*3fc10f8cSRobert Mustacchi.Fa ps . 94*3fc10f8cSRobert MustacchiOnce a valid code point, based on the current locale, is found, then it 95*3fc10f8cSRobert Mustacchiwill be converted into a series of characters that are stored in 96*3fc10f8cSRobert Mustacchi.Fa str . 97*3fc10f8cSRobert MustacchiUp to 98*3fc10f8cSRobert Mustacchi.Dv MB_CUR_MAX 99*3fc10f8cSRobert Mustacchibytes will be stored in 100*3fc10f8cSRobert Mustacchi.Fa str . 101*3fc10f8cSRobert MustacchiIt is the caller's responsibility to ensure that there is sufficient 102*3fc10f8cSRobert Mustacchispace in 103*3fc10f8cSRobert Mustacchi.Fa str . 104*3fc10f8cSRobert Mustacchi.Pp 105*3fc10f8cSRobert MustacchiThe functions are all influenced by the 106*3fc10f8cSRobert Mustacchi.Dv LC_CTYPE 107*3fc10f8cSRobert Mustacchicategory of the current locale for determining what is considered a 108*3fc10f8cSRobert Mustacchivalid character. 109*3fc10f8cSRobert MustacchiFor example, in the 110*3fc10f8cSRobert Mustacchi.Sy C 111*3fc10f8cSRobert Mustacchilocale, 112*3fc10f8cSRobert Mustacchionly ASCII characters are recognized, while in a 113*3fc10f8cSRobert Mustacchi.Sy UTF-8 114*3fc10f8cSRobert Mustacchibased locale like 115*3fc10f8cSRobert Mustacchi.Sy en_us.UTF-8 , 116*3fc10f8cSRobert Mustacchiall valid Unicode code points are recognized and will be converted into 117*3fc10f8cSRobert Mustacchithe corresponding multi-byte sequence. 118*3fc10f8cSRobert MustacchiThe 119*3fc10f8cSRobert Mustacchi.Fn wcrtomb_l 120*3fc10f8cSRobert Mustacchifunction uses the locale passed in 121*3fc10f8cSRobert Mustacchi.Fa loc 122*3fc10f8cSRobert Mustacchirather than the locale of the current thread. 123*3fc10f8cSRobert Mustacchi.Pp 124*3fc10f8cSRobert MustacchiThe 125*3fc10f8cSRobert Mustacchi.Fa ps 126*3fc10f8cSRobert Mustacchiargument represents a multi-byte conversion state which can be used 127*3fc10f8cSRobert Mustacchiacross multiple calls to a given function 128*3fc10f8cSRobert Mustacchi.Pq but not mixed between functions . 129*3fc10f8cSRobert MustacchiThese allow for characters to be consumed from subsequent buffers, e.g. 130*3fc10f8cSRobert Mustacchidifferent values of 131*3fc10f8cSRobert Mustacchi.Fa str . 132*3fc10f8cSRobert MustacchiThe functions may be called from multiple threads as long as they use 133*3fc10f8cSRobert Mustacchiunique values for 134*3fc10f8cSRobert Mustacchi.Fa ps . 135*3fc10f8cSRobert MustacchiIf 136*3fc10f8cSRobert Mustacchi.Fa ps 137*3fc10f8cSRobert Mustacchiis 138*3fc10f8cSRobert Mustacchi.Dv NULL , 139*3fc10f8cSRobert Mustacchithen a function-specific buffer will be used for the conversion state; 140*3fc10f8cSRobert Mustacchihowever, this is stored between all threads and its use is not 141*3fc10f8cSRobert Mustacchirecommended. 142*3fc10f8cSRobert Mustacchi.Pp 143*3fc10f8cSRobert MustacchiThe functions all have a special behavior when 144*3fc10f8cSRobert Mustacchi.Dv NULL 145*3fc10f8cSRobert Mustacchiis passed for 146*3fc10f8cSRobert Mustacchi.Fa str . 147*3fc10f8cSRobert MustacchiThey instead will treat it as though a the NULL wide-character was 148*3fc10f8cSRobert Mustacchipassed in 149*3fc10f8cSRobert Mustacchi.Fa c16 , 150*3fc10f8cSRobert Mustacchi.Fa c32 , 151*3fc10f8cSRobert Mustacchior 152*3fc10f8cSRobert Mustacchi.Fa wc 153*3fc10f8cSRobert Mustacchiand an internal buffer 154*3fc10f8cSRobert Mustacchi.Pq buf 155*3fc10f8cSRobert Mustacchiwill be used to write out the results of the 156*3fc10f8cSRobert Mustacchiconverstion. 157*3fc10f8cSRobert MustacchiIn other words, the functions would be called as: 158*3fc10f8cSRobert Mustacchi.Bd -literal -offset indent 159*3fc10f8cSRobert Mustacchic16rtomb(buf, L'\\0', ps) 160*3fc10f8cSRobert Mustacchic32rtomb(buf, L'\\0', ps) 161*3fc10f8cSRobert Mustacchiwcrtomb(buf, L'\\0', ps) 162*3fc10f8cSRobert Mustacchiwcrtomb_l(buf, L'\\0', ps, loc) 163*3fc10f8cSRobert Mustacchi.Ed 164*3fc10f8cSRobert Mustacchi.Ss Locale Details 165*3fc10f8cSRobert MustacchiNot all locales in the system are Unicode based locales. 166*3fc10f8cSRobert MustacchiFor example, ISO 8859 family locales have code points with values that 167*3fc10f8cSRobert Mustacchido not match their counterparts in Unicode. 168*3fc10f8cSRobert MustacchiWhen using these functions with non-Unicode based locales, the code 169*3fc10f8cSRobert Mustacchipoints returned will be those determined by the locale. 170*3fc10f8cSRobert MustacchiThey will not be converted from the corresponding Unicode code point. 171*3fc10f8cSRobert MustacchiFor example, if using the Euro sign in ISO 8859-15, these functions 172*3fc10f8cSRobert Mustacchiwill not encode the Unicode value 0x20ac into the ISO 8859-15 value 173*3fc10f8cSRobert Mustacchi0xa4. 174*3fc10f8cSRobert Mustacchi.Pp 175*3fc10f8cSRobert MustacchiRegardless of the locale, the characters returned will be encoded as 176*3fc10f8cSRobert Mustacchithough the code point were the corresponding value in Unicode. 177*3fc10f8cSRobert MustacchiThis means that when using UTF-16, if the corresponding code point were 178*3fc10f8cSRobert Mustacchiin the range for surorgate pairs, then the 179*3fc10f8cSRobert Mustacchi.Fn c16rtomb 180*3fc10f8cSRobert Mustacchifunction will expect to receive that code point in that fashion. 181*3fc10f8cSRobert Mustacchi.Pp 182*3fc10f8cSRobert MustacchiThis behavior of the 183*3fc10f8cSRobert Mustacchi.Fn c16rtomb 184*3fc10f8cSRobert Mustacchiand 185*3fc10f8cSRobert Mustacchi.Fn c32rtomb 186*3fc10f8cSRobert Mustacchifunctions should not be relied upon, is not portable, and subject to 187*3fc10f8cSRobert Mustacchichange for non-Unicode locales. 188*3fc10f8cSRobert Mustacchi.Sh RETURN VALUES 189*3fc10f8cSRobert MustacchiUpon successful completion, the 190*3fc10f8cSRobert Mustacchi.Fn c16rtomb , 191*3fc10f8cSRobert Mustacchi.Fn c32rtomb , 192*3fc10f8cSRobert Mustacchi.Fn wcrtomb , 193*3fc10f8cSRobert Mustacchiand 194*3fc10f8cSRobert Mustacchi.Fn wcrtomb_l 195*3fc10f8cSRobert Mustacchifunctions return the number of bytes stored in 196*3fc10f8cSRobert Mustacchi.Fa str . 197*3fc10f8cSRobert MustacchiOtherwise, 198*3fc10f8cSRobert Mustacchi.Sy (size_t)-1 199*3fc10f8cSRobert Mustacchiis returned to indicate an encoding error and 200*3fc10f8cSRobert Mustacchi.Va errno 201*3fc10f8cSRobert Mustacchiis set. 202*3fc10f8cSRobert Mustacchi.Sh EXAMPLES 203*3fc10f8cSRobert Mustacchi.Sy Example 1 204*3fc10f8cSRobert MustacchiConverting a UTF-32 character into a multi-byte character sequence. 205*3fc10f8cSRobert Mustacchi.Bd -literal 206*3fc10f8cSRobert Mustacchi#include <locale.h> 207*3fc10f8cSRobert Mustacchi#include <stdlib.h> 208*3fc10f8cSRobert Mustacchi#include <string.h> 209*3fc10f8cSRobert Mustacchi#include <err.h> 210*3fc10f8cSRobert Mustacchi#include <stdio.h> 211*3fc10f8cSRobert Mustacchi#include <uchar.h> 212*3fc10f8cSRobert Mustacchi 213*3fc10f8cSRobert Mustacchiint 214*3fc10f8cSRobert Mustacchimain(void) 215*3fc10f8cSRobert Mustacchi{ 216*3fc10f8cSRobert Mustacchi mbstate_t mbs; 217*3fc10f8cSRobert Mustacchi size_t ret; 218*3fc10f8cSRobert Mustacchi char buf[MB_CUR_MAX]; 219*3fc10f8cSRobert Mustacchi char32_t val = 0x5149; 220*3fc10f8cSRobert Mustacchi const char *uchar_exp = "\exe5\ex85\ex89"; 221*3fc10f8cSRobert Mustacchi 222*3fc10f8cSRobert Mustacchi (void) memset(&mbs, 0, sizeof (mbs)); 223*3fc10f8cSRobert Mustacchi (void) setlocale(LC_CTYPE, "en_US.UTF-8"); 224*3fc10f8cSRobert Mustacchi ret = c32rtomb(buf, val, &mbs); 225*3fc10f8cSRobert Mustacchi if (ret != strlen(uchar_exp)) { 226*3fc10f8cSRobert Mustacchi errx(EXIT_FAILURE, "failed to convert string, got %zd", 227*3fc10f8cSRobert Mustacchi ret); 228*3fc10f8cSRobert Mustacchi } 229*3fc10f8cSRobert Mustacchi 230*3fc10f8cSRobert Mustacchi if (strncmp(buf, uchar_exp, ret) != 0) { 231*3fc10f8cSRobert Mustacchi errx(EXIT_FAILURE, "converted char32_t does not match " 232*3fc10f8cSRobert Mustacchi "expected value"); 233*3fc10f8cSRobert Mustacchi } 234*3fc10f8cSRobert Mustacchi 235*3fc10f8cSRobert Mustacchi return (0); 236*3fc10f8cSRobert Mustacchi} 237*3fc10f8cSRobert Mustacchi.Ed 238*3fc10f8cSRobert Mustacchi.Sh ERRORS 239*3fc10f8cSRobert MustacchiThe 240*3fc10f8cSRobert Mustacchi.Fn c16rtomb , 241*3fc10f8cSRobert Mustacchi.Fn c32rtomb , 242*3fc10f8cSRobert Mustacchi.Fn wcrtomb , 243*3fc10f8cSRobert Mustacchiand 244*3fc10f8cSRobert Mustacchi.Fn wcrtomb_l 245*3fc10f8cSRobert Mustacchifunctions will fail if: 246*3fc10f8cSRobert Mustacchi.Bl -tag -width Er 247*3fc10f8cSRobert Mustacchi.It Er EINVAL 248*3fc10f8cSRobert MustacchiThe conversion state in 249*3fc10f8cSRobert Mustacchi.Fa ps 250*3fc10f8cSRobert Mustacchiis invalid. 251*3fc10f8cSRobert Mustacchi.It Er EILSEQ 252*3fc10f8cSRobert MustacchiAn invalid character sequence has been detected. 253*3fc10f8cSRobert Mustacchi.El 254*3fc10f8cSRobert Mustacchi.Sh MT-LEVEL 255*3fc10f8cSRobert MustacchiThe 256*3fc10f8cSRobert Mustacchi.Fn c16rtomb , 257*3fc10f8cSRobert Mustacchi.Fn c32rtomb , 258*3fc10f8cSRobert Mustacchi.Fn wcrtomb , 259*3fc10f8cSRobert Mustacchiand 260*3fc10f8cSRobert Mustacchi.Fn wcrtomb_l 261*3fc10f8cSRobert Mustacchifunctions are 262*3fc10f8cSRobert Mustacchi.Sy MT-Safe 263*3fc10f8cSRobert Mustacchias long as different 264*3fc10f8cSRobert Mustacchi.Vt mbstate_t 265*3fc10f8cSRobert Mustacchistructures are passed in 266*3fc10f8cSRobert Mustacchi.Fa ps . 267*3fc10f8cSRobert MustacchiIf 268*3fc10f8cSRobert Mustacchi.Fa ps 269*3fc10f8cSRobert Mustacchiis 270*3fc10f8cSRobert Mustacchi.Dv NULL 271*3fc10f8cSRobert Mustacchior different threads use the same value for 272*3fc10f8cSRobert Mustacchi.Fa ps , 273*3fc10f8cSRobert Mustacchithen the functions are 274*3fc10f8cSRobert Mustacchi.Sy Unsafe . 275*3fc10f8cSRobert Mustacchi.Sh INTERFACE STABILITY 276*3fc10f8cSRobert Mustacchi.Sy Committed 277*3fc10f8cSRobert Mustacchi.Sh SEE ALSO 278*3fc10f8cSRobert Mustacchi.Xr mbrtoc16 3C , 279*3fc10f8cSRobert Mustacchi.Xr mbrtoc32 3C , 280*3fc10f8cSRobert Mustacchi.Xr mbrtowc 3C , 281*3fc10f8cSRobert Mustacchi.Xr newlocale 3C , 282*3fc10f8cSRobert Mustacchi.Xr setlocale 3C , 283*3fc10f8cSRobert Mustacchi.Xr uselocale 3C , 284*3fc10f8cSRobert Mustacchi.Xr uchar.h 3HEAD , 285*3fc10f8cSRobert Mustacchi.Xr environ 5 286