1*9a4a12bdSRobert Mustacchi.\" 2*9a4a12bdSRobert Mustacchi.\" This file and its contents are supplied under the terms of the 3*9a4a12bdSRobert Mustacchi.\" Common Development and Distribution License ("CDDL"), version 1.0. 4*9a4a12bdSRobert Mustacchi.\" You may only use this file in accordance with the terms of version 5*9a4a12bdSRobert Mustacchi.\" 1.0 of the CDDL. 6*9a4a12bdSRobert Mustacchi.\" 7*9a4a12bdSRobert Mustacchi.\" A full copy of the text of the CDDL should have accompanied this 8*9a4a12bdSRobert Mustacchi.\" source. A copy of the CDDL is also available via the Internet at 9*9a4a12bdSRobert Mustacchi.\" http://www.illumos.org/license/CDDL. 10*9a4a12bdSRobert Mustacchi.\" 11*9a4a12bdSRobert Mustacchi.\" 12*9a4a12bdSRobert Mustacchi.\" Copyright 2020 Robert Mustacchi 13*9a4a12bdSRobert Mustacchi.\" 14*9a4a12bdSRobert Mustacchi.Dd April 23, 2020 15*9a4a12bdSRobert Mustacchi.Dt C16RTOMB 3C 16*9a4a12bdSRobert Mustacchi.Os 17*9a4a12bdSRobert Mustacchi.Sh NAME 18*9a4a12bdSRobert Mustacchi.Nm c16rtomb , 19*9a4a12bdSRobert Mustacchi.Nm c32rtomb , 20*9a4a12bdSRobert Mustacchi.Nm wcrtomb , 21*9a4a12bdSRobert Mustacchi.Nm wcrtomb_l 22*9a4a12bdSRobert Mustacchi.Nd convert wide-characters to character sequences 23*9a4a12bdSRobert Mustacchi.Sh SYNOPSIS 24*9a4a12bdSRobert Mustacchi.In uchar.h 25*9a4a12bdSRobert Mustacchi.Ft size_t 26*9a4a12bdSRobert Mustacchi.Fo c16rtomb 27*9a4a12bdSRobert Mustacchi.Fa "char *restrict str" 28*9a4a12bdSRobert Mustacchi.Fa "char16_t c16" 29*9a4a12bdSRobert Mustacchi.Fa "mbstate_t *restrict ps" 30*9a4a12bdSRobert Mustacchi.Fc 31*9a4a12bdSRobert Mustacchi.Ft size_t 32*9a4a12bdSRobert Mustacchi.Fo c32rtomb 33*9a4a12bdSRobert Mustacchi.Fa "char *restrict str" 34*9a4a12bdSRobert Mustacchi.Fa "char32_t c32" 35*9a4a12bdSRobert Mustacchi.Fa "mbstate_t *restrict ps" 36*9a4a12bdSRobert Mustacchi.Fc 37*9a4a12bdSRobert Mustacchi.In stdio.h 38*9a4a12bdSRobert Mustacchi.Ft size_t 39*9a4a12bdSRobert Mustacchi.Fo wcrtomb 40*9a4a12bdSRobert Mustacchi.Fa "char *restrict str" 41*9a4a12bdSRobert Mustacchi.Fa "wchar_t wc" 42*9a4a12bdSRobert Mustacchi.Fa "mbstate_t *restrict ps" 43*9a4a12bdSRobert Mustacchi.Fc 44*9a4a12bdSRobert Mustacchi.In stdio.h 45*9a4a12bdSRobert Mustacchi.In xlocale.h 46*9a4a12bdSRobert Mustacchi.Ft size_t 47*9a4a12bdSRobert Mustacchi.Fo wcrtomb_l 48*9a4a12bdSRobert Mustacchi.Fa "char *restrict str" 49*9a4a12bdSRobert Mustacchi.Fa "wchar_t wc" 50*9a4a12bdSRobert Mustacchi.Fa "mbstate_t *restrict ps" 51*9a4a12bdSRobert Mustacchi.Fa "locale_t loc" 52*9a4a12bdSRobert Mustacchi.Fc 53*9a4a12bdSRobert Mustacchi.Sh DESCRIPTION 54*9a4a12bdSRobert MustacchiThe 55*9a4a12bdSRobert Mustacchi.Fn c16rtomb , 56*9a4a12bdSRobert Mustacchi.Fn c32rtomb , 57*9a4a12bdSRobert Mustacchi.Fn wcrtomb , 58*9a4a12bdSRobert Mustacchiand 59*9a4a12bdSRobert Mustacchi.Fn wcrtomb_l 60*9a4a12bdSRobert Mustacchifunctions convert wide-character sequences into a series of multi-byte 61*9a4a12bdSRobert Mustacchicharacters. 62*9a4a12bdSRobert MustacchiThe functions work in the following formats: 63*9a4a12bdSRobert Mustacchi.Bl -tag -width wcrtomb_l 64*9a4a12bdSRobert Mustacchi.It Fn c16rtomb 65*9a4a12bdSRobert MustacchiA UTF-16 code sequence, where every code point is represented by one or 66*9a4a12bdSRobert Mustacchitwo 67*9a4a12bdSRobert Mustacchi.Vt char16_t . 68*9a4a12bdSRobert MustacchiThe UTF-16 encoding will encode certain Unicode code points as a pair of 69*9a4a12bdSRobert Mustacchitwo 16-bit code sequences, commonly referred to as a surrogate pair. 70*9a4a12bdSRobert Mustacchi.It Fn c32rtomb 71*9a4a12bdSRobert MustacchiA UTF-32 code sequence, where every code point is represented by a 72*9a4a12bdSRobert Mustacchisingle 73*9a4a12bdSRobert Mustacchi.Vt char32_t . 74*9a4a12bdSRobert MustacchiIt is illegal to pass reserved Unicode code points. 75*9a4a12bdSRobert Mustacchi.It Fn wcrtomb , Fn wcrtomb_l 76*9a4a12bdSRobert MustacchiWide characters, being a 32-bit value where every code point is 77*9a4a12bdSRobert Mustacchirepresented by a single 78*9a4a12bdSRobert Mustacchi.Vt wchar_t . 79*9a4a12bdSRobert MustacchiWhile the 80*9a4a12bdSRobert Mustacchi.Vt wchar_t 81*9a4a12bdSRobert Mustacchiand 82*9a4a12bdSRobert Mustacchi.Vt char32_t 83*9a4a12bdSRobert Mustacchiare different types, in this implementation, they are similar encodings. 84*9a4a12bdSRobert Mustacchi.El 85*9a4a12bdSRobert Mustacchi.Pp 86*9a4a12bdSRobert MustacchiThe functions all work by looking at the passed in wide-character 87*9a4a12bdSRobert Mustacchi.Po 88*9a4a12bdSRobert Mustacchi.Fa c16 , 89*9a4a12bdSRobert Mustacchi.Fa c32 , 90*9a4a12bdSRobert Mustacchi.Fa wc 91*9a4a12bdSRobert Mustacchi.Pc 92*9a4a12bdSRobert Mustacchiand appending it to the current conversion state, 93*9a4a12bdSRobert Mustacchi.Fa ps . 94*9a4a12bdSRobert MustacchiOnce a valid code point, based on the current locale, is found, then it 95*9a4a12bdSRobert Mustacchiwill be converted into a series of characters that are stored in 96*9a4a12bdSRobert Mustacchi.Fa str . 97*9a4a12bdSRobert MustacchiUp to 98*9a4a12bdSRobert Mustacchi.Dv MB_CUR_MAX 99*9a4a12bdSRobert Mustacchibytes will be stored in 100*9a4a12bdSRobert Mustacchi.Fa str . 101*9a4a12bdSRobert MustacchiIt is the caller's responsibility to ensure that there is sufficient 102*9a4a12bdSRobert Mustacchispace in 103*9a4a12bdSRobert Mustacchi.Fa str . 104*9a4a12bdSRobert Mustacchi.Pp 105*9a4a12bdSRobert MustacchiThe functions are all influenced by the 106*9a4a12bdSRobert Mustacchi.Dv LC_CTYPE 107*9a4a12bdSRobert Mustacchicategory of the current locale for determining what is considered a 108*9a4a12bdSRobert Mustacchivalid character. 109*9a4a12bdSRobert MustacchiFor example, in the 110*9a4a12bdSRobert Mustacchi.Sy C 111*9a4a12bdSRobert Mustacchilocale, 112*9a4a12bdSRobert Mustacchionly ASCII characters are recognized, while in a 113*9a4a12bdSRobert Mustacchi.Sy UTF-8 114*9a4a12bdSRobert Mustacchibased locale like 115*9a4a12bdSRobert Mustacchi.Sy en_us.UTF-8 , 116*9a4a12bdSRobert Mustacchiall valid Unicode code points are recognized and will be converted into 117*9a4a12bdSRobert Mustacchithe corresponding multi-byte sequence. 118*9a4a12bdSRobert MustacchiThe 119*9a4a12bdSRobert Mustacchi.Fn wcrtomb_l 120*9a4a12bdSRobert Mustacchifunction uses the locale passed in 121*9a4a12bdSRobert Mustacchi.Fa loc 122*9a4a12bdSRobert Mustacchirather than the locale of the current thread. 123*9a4a12bdSRobert Mustacchi.Pp 124*9a4a12bdSRobert MustacchiThe 125*9a4a12bdSRobert Mustacchi.Fa ps 126*9a4a12bdSRobert Mustacchiargument represents a multi-byte conversion state which can be used 127*9a4a12bdSRobert Mustacchiacross multiple calls to a given function 128*9a4a12bdSRobert Mustacchi.Pq but not mixed between functions . 129*9a4a12bdSRobert MustacchiThese allow for characters to be consumed from subsequent buffers, e.g. 130*9a4a12bdSRobert Mustacchidifferent values of 131*9a4a12bdSRobert Mustacchi.Fa str . 132*9a4a12bdSRobert MustacchiThe functions may be called from multiple threads as long as they use 133*9a4a12bdSRobert Mustacchiunique values for 134*9a4a12bdSRobert Mustacchi.Fa ps . 135*9a4a12bdSRobert MustacchiIf 136*9a4a12bdSRobert Mustacchi.Fa ps 137*9a4a12bdSRobert Mustacchiis 138*9a4a12bdSRobert Mustacchi.Dv NULL , 139*9a4a12bdSRobert Mustacchithen a function-specific buffer will be used for the conversion state; 140*9a4a12bdSRobert Mustacchihowever, this is stored between all threads and its use is not 141*9a4a12bdSRobert Mustacchirecommended. 142*9a4a12bdSRobert Mustacchi.Pp 143*9a4a12bdSRobert MustacchiThe functions all have a special behavior when 144*9a4a12bdSRobert Mustacchi.Dv NULL 145*9a4a12bdSRobert Mustacchiis passed for 146*9a4a12bdSRobert Mustacchi.Fa str . 147*9a4a12bdSRobert MustacchiThey instead will treat it as though a the NULL wide-character was 148*9a4a12bdSRobert Mustacchipassed in 149*9a4a12bdSRobert Mustacchi.Fa c16 , 150*9a4a12bdSRobert Mustacchi.Fa c32 , 151*9a4a12bdSRobert Mustacchior 152*9a4a12bdSRobert Mustacchi.Fa wc 153*9a4a12bdSRobert Mustacchiand an internal buffer 154*9a4a12bdSRobert Mustacchi.Pq buf 155*9a4a12bdSRobert Mustacchiwill be used to write out the results of the 156*9a4a12bdSRobert Mustacchiconverstion. 157*9a4a12bdSRobert MustacchiIn other words, the functions would be called as: 158*9a4a12bdSRobert Mustacchi.Bd -literal -offset indent 159*9a4a12bdSRobert Mustacchic16rtomb(buf, L'\\0', ps) 160*9a4a12bdSRobert Mustacchic32rtomb(buf, L'\\0', ps) 161*9a4a12bdSRobert Mustacchiwcrtomb(buf, L'\\0', ps) 162*9a4a12bdSRobert Mustacchiwcrtomb_l(buf, L'\\0', ps, loc) 163*9a4a12bdSRobert Mustacchi.Ed 164*9a4a12bdSRobert Mustacchi.Ss Locale Details 165*9a4a12bdSRobert MustacchiNot all locales in the system are Unicode based locales. 166*9a4a12bdSRobert MustacchiFor example, ISO 8859 family locales have code points with values that 167*9a4a12bdSRobert Mustacchido not match their counterparts in Unicode. 168*9a4a12bdSRobert MustacchiWhen using these functions with non-Unicode based locales, the code 169*9a4a12bdSRobert Mustacchipoints returned will be those determined by the locale. 170*9a4a12bdSRobert MustacchiThey will not be converted from the corresponding Unicode code point. 171*9a4a12bdSRobert MustacchiFor example, if using the Euro sign in ISO 8859-15, these functions 172*9a4a12bdSRobert Mustacchiwill not encode the Unicode value 0x20ac into the ISO 8859-15 value 173*9a4a12bdSRobert Mustacchi0xa4. 174*9a4a12bdSRobert Mustacchi.Pp 175*9a4a12bdSRobert MustacchiRegardless of the locale, the characters returned will be encoded as 176*9a4a12bdSRobert Mustacchithough the code point were the corresponding value in Unicode. 177*9a4a12bdSRobert MustacchiThis means that when using UTF-16, if the corresponding code point were 178*9a4a12bdSRobert Mustacchiin the range for surorgate pairs, then the 179*9a4a12bdSRobert Mustacchi.Fn c16rtomb 180*9a4a12bdSRobert Mustacchifunction will expect to receive that code point in that fashion. 181*9a4a12bdSRobert Mustacchi.Pp 182*9a4a12bdSRobert MustacchiThis behavior of the 183*9a4a12bdSRobert Mustacchi.Fn c16rtomb 184*9a4a12bdSRobert Mustacchiand 185*9a4a12bdSRobert Mustacchi.Fn c32rtomb 186*9a4a12bdSRobert Mustacchifunctions should not be relied upon, is not portable, and subject to 187*9a4a12bdSRobert Mustacchichange for non-Unicode locales. 188*9a4a12bdSRobert Mustacchi.Sh RETURN VALUES 189*9a4a12bdSRobert MustacchiUpon successful completion, the 190*9a4a12bdSRobert Mustacchi.Fn c16rtomb , 191*9a4a12bdSRobert Mustacchi.Fn c32rtomb , 192*9a4a12bdSRobert Mustacchi.Fn wcrtomb , 193*9a4a12bdSRobert Mustacchiand 194*9a4a12bdSRobert Mustacchi.Fn wcrtomb_l 195*9a4a12bdSRobert Mustacchifunctions return the number of bytes stored in 196*9a4a12bdSRobert Mustacchi.Fa str . 197*9a4a12bdSRobert MustacchiOtherwise, 198*9a4a12bdSRobert Mustacchi.Sy (size_t)-1 199*9a4a12bdSRobert Mustacchiis returned to indicate an encoding error and 200*9a4a12bdSRobert Mustacchi.Va errno 201*9a4a12bdSRobert Mustacchiis set. 202*9a4a12bdSRobert Mustacchi.Sh EXAMPLES 203*9a4a12bdSRobert Mustacchi.Sy Example 1 204*9a4a12bdSRobert MustacchiConverting a UTF-32 character into a multi-byte character sequence. 205*9a4a12bdSRobert Mustacchi.Bd -literal 206*9a4a12bdSRobert Mustacchi#include <locale.h> 207*9a4a12bdSRobert Mustacchi#include <stdlib.h> 208*9a4a12bdSRobert Mustacchi#include <string.h> 209*9a4a12bdSRobert Mustacchi#include <err.h> 210*9a4a12bdSRobert Mustacchi#include <stdio.h> 211*9a4a12bdSRobert Mustacchi#include <uchar.h> 212*9a4a12bdSRobert Mustacchi 213*9a4a12bdSRobert Mustacchiint 214*9a4a12bdSRobert Mustacchimain(void) 215*9a4a12bdSRobert Mustacchi{ 216*9a4a12bdSRobert Mustacchi mbstate_t mbs; 217*9a4a12bdSRobert Mustacchi size_t ret; 218*9a4a12bdSRobert Mustacchi char buf[MB_CUR_MAX]; 219*9a4a12bdSRobert Mustacchi char32_t val = 0x5149; 220*9a4a12bdSRobert Mustacchi const char *uchar_exp = "\exe5\ex85\ex89"; 221*9a4a12bdSRobert Mustacchi 222*9a4a12bdSRobert Mustacchi (void) memset(&mbs, 0, sizeof (mbs)); 223*9a4a12bdSRobert Mustacchi (void) setlocale(LC_CTYPE, "en_US.UTF-8"); 224*9a4a12bdSRobert Mustacchi ret = c32rtomb(buf, val, &mbs); 225*9a4a12bdSRobert Mustacchi if (ret != strlen(uchar_exp)) { 226*9a4a12bdSRobert Mustacchi errx(EXIT_FAILURE, "failed to convert string, got %zd", 227*9a4a12bdSRobert Mustacchi ret); 228*9a4a12bdSRobert Mustacchi } 229*9a4a12bdSRobert Mustacchi 230*9a4a12bdSRobert Mustacchi if (strncmp(buf, uchar_exp, ret) != 0) { 231*9a4a12bdSRobert Mustacchi errx(EXIT_FAILURE, "converted char32_t does not match " 232*9a4a12bdSRobert Mustacchi "expected value"); 233*9a4a12bdSRobert Mustacchi } 234*9a4a12bdSRobert Mustacchi 235*9a4a12bdSRobert Mustacchi return (0); 236*9a4a12bdSRobert Mustacchi} 237*9a4a12bdSRobert Mustacchi.Ed 238*9a4a12bdSRobert Mustacchi.Sh ERRORS 239*9a4a12bdSRobert MustacchiThe 240*9a4a12bdSRobert Mustacchi.Fn c16rtomb , 241*9a4a12bdSRobert Mustacchi.Fn c32rtomb , 242*9a4a12bdSRobert Mustacchi.Fn wcrtomb , 243*9a4a12bdSRobert Mustacchiand 244*9a4a12bdSRobert Mustacchi.Fn wcrtomb_l 245*9a4a12bdSRobert Mustacchifunctions will fail if: 246*9a4a12bdSRobert Mustacchi.Bl -tag -width Er 247*9a4a12bdSRobert Mustacchi.It Er EINVAL 248*9a4a12bdSRobert MustacchiThe conversion state in 249*9a4a12bdSRobert Mustacchi.Fa ps 250*9a4a12bdSRobert Mustacchiis invalid. 251*9a4a12bdSRobert Mustacchi.It Er EILSEQ 252*9a4a12bdSRobert MustacchiAn invalid character sequence has been detected. 253*9a4a12bdSRobert Mustacchi.El 254*9a4a12bdSRobert Mustacchi.Sh MT-LEVEL 255*9a4a12bdSRobert MustacchiThe 256*9a4a12bdSRobert Mustacchi.Fn c16rtomb , 257*9a4a12bdSRobert Mustacchi.Fn c32rtomb , 258*9a4a12bdSRobert Mustacchi.Fn wcrtomb , 259*9a4a12bdSRobert Mustacchiand 260*9a4a12bdSRobert Mustacchi.Fn wcrtomb_l 261*9a4a12bdSRobert Mustacchifunctions are 262*9a4a12bdSRobert Mustacchi.Sy MT-Safe 263*9a4a12bdSRobert Mustacchias long as different 264*9a4a12bdSRobert Mustacchi.Vt mbstate_t 265*9a4a12bdSRobert Mustacchistructures are passed in 266*9a4a12bdSRobert Mustacchi.Fa ps . 267*9a4a12bdSRobert MustacchiIf 268*9a4a12bdSRobert Mustacchi.Fa ps 269*9a4a12bdSRobert Mustacchiis 270*9a4a12bdSRobert Mustacchi.Dv NULL 271*9a4a12bdSRobert Mustacchior different threads use the same value for 272*9a4a12bdSRobert Mustacchi.Fa ps , 273*9a4a12bdSRobert Mustacchithen the functions are 274*9a4a12bdSRobert Mustacchi.Sy Unsafe . 275*9a4a12bdSRobert Mustacchi.Sh INTERFACE STABILITY 276*9a4a12bdSRobert Mustacchi.Sy Committed 277*9a4a12bdSRobert Mustacchi.Sh SEE ALSO 278*9a4a12bdSRobert Mustacchi.Xr mbrtoc16 3C , 279*9a4a12bdSRobert Mustacchi.Xr mbrtoc32 3C , 280*9a4a12bdSRobert Mustacchi.Xr mbrtowc 3C , 281*9a4a12bdSRobert Mustacchi.Xr newlocale 3C , 282*9a4a12bdSRobert Mustacchi.Xr setlocale 3C , 283*9a4a12bdSRobert Mustacchi.Xr uselocale 3C , 284*9a4a12bdSRobert Mustacchi.Xr uchar.h 3HEAD , 285*9a4a12bdSRobert Mustacchi.Xr environ 5 286