1eda3ef2dSRobert Mustacchi.\" 2eda3ef2dSRobert Mustacchi.\" This file and its contents are supplied under the terms of the 3eda3ef2dSRobert Mustacchi.\" Common Development and Distribution License ("CDDL"), version 1.0. 4eda3ef2dSRobert Mustacchi.\" You may only use this file in accordance with the terms of version 5eda3ef2dSRobert Mustacchi.\" 1.0 of the CDDL. 6eda3ef2dSRobert Mustacchi.\" 7eda3ef2dSRobert Mustacchi.\" A full copy of the text of the CDDL should have accompanied this 8eda3ef2dSRobert Mustacchi.\" source. A copy of the CDDL is also available via the Internet at 9eda3ef2dSRobert Mustacchi.\" http://www.illumos.org/license/CDDL. 10eda3ef2dSRobert Mustacchi.\" 11eda3ef2dSRobert Mustacchi.\" 12eda3ef2dSRobert Mustacchi.\" Copyright 2020 Robert Mustacchi 13eda3ef2dSRobert Mustacchi.\" 14*4a8d6d7cSPeter Tribble.Dd February 17, 2023 15eda3ef2dSRobert Mustacchi.Dt C16RTOMB 3C 16eda3ef2dSRobert Mustacchi.Os 17eda3ef2dSRobert Mustacchi.Sh NAME 18eda3ef2dSRobert Mustacchi.Nm c16rtomb , 19eda3ef2dSRobert Mustacchi.Nm c32rtomb , 20eda3ef2dSRobert Mustacchi.Nm wcrtomb , 21eda3ef2dSRobert Mustacchi.Nm wcrtomb_l 22eda3ef2dSRobert Mustacchi.Nd convert wide-characters to character sequences 23eda3ef2dSRobert Mustacchi.Sh SYNOPSIS 24eda3ef2dSRobert Mustacchi.In uchar.h 25eda3ef2dSRobert Mustacchi.Ft size_t 26eda3ef2dSRobert Mustacchi.Fo c16rtomb 27eda3ef2dSRobert Mustacchi.Fa "char *restrict str" 28eda3ef2dSRobert Mustacchi.Fa "char16_t c16" 29eda3ef2dSRobert Mustacchi.Fa "mbstate_t *restrict ps" 30eda3ef2dSRobert Mustacchi.Fc 31eda3ef2dSRobert Mustacchi.Ft size_t 32eda3ef2dSRobert Mustacchi.Fo c32rtomb 33eda3ef2dSRobert Mustacchi.Fa "char *restrict str" 34eda3ef2dSRobert Mustacchi.Fa "char32_t c32" 35eda3ef2dSRobert Mustacchi.Fa "mbstate_t *restrict ps" 36eda3ef2dSRobert Mustacchi.Fc 37eda3ef2dSRobert Mustacchi.In stdio.h 38eda3ef2dSRobert Mustacchi.Ft size_t 39eda3ef2dSRobert Mustacchi.Fo wcrtomb 40eda3ef2dSRobert Mustacchi.Fa "char *restrict str" 41eda3ef2dSRobert Mustacchi.Fa "wchar_t wc" 42eda3ef2dSRobert Mustacchi.Fa "mbstate_t *restrict ps" 43eda3ef2dSRobert Mustacchi.Fc 44eda3ef2dSRobert Mustacchi.In stdio.h 45eda3ef2dSRobert Mustacchi.In xlocale.h 46eda3ef2dSRobert Mustacchi.Ft size_t 47eda3ef2dSRobert Mustacchi.Fo wcrtomb_l 48eda3ef2dSRobert Mustacchi.Fa "char *restrict str" 49eda3ef2dSRobert Mustacchi.Fa "wchar_t wc" 50eda3ef2dSRobert Mustacchi.Fa "mbstate_t *restrict ps" 51eda3ef2dSRobert Mustacchi.Fa "locale_t loc" 52eda3ef2dSRobert Mustacchi.Fc 53eda3ef2dSRobert Mustacchi.Sh DESCRIPTION 54eda3ef2dSRobert MustacchiThe 55eda3ef2dSRobert Mustacchi.Fn c16rtomb , 56eda3ef2dSRobert Mustacchi.Fn c32rtomb , 57eda3ef2dSRobert Mustacchi.Fn wcrtomb , 58eda3ef2dSRobert Mustacchiand 59eda3ef2dSRobert Mustacchi.Fn wcrtomb_l 60eda3ef2dSRobert Mustacchifunctions convert wide-character sequences into a series of multi-byte 61eda3ef2dSRobert Mustacchicharacters. 62eda3ef2dSRobert MustacchiThe functions work in the following formats: 63eda3ef2dSRobert Mustacchi.Bl -tag -width wcrtomb_l 64eda3ef2dSRobert Mustacchi.It Fn c16rtomb 65eda3ef2dSRobert MustacchiA UTF-16 code sequence, where every code point is represented by one or 66eda3ef2dSRobert Mustacchitwo 67eda3ef2dSRobert Mustacchi.Vt char16_t . 68eda3ef2dSRobert MustacchiThe UTF-16 encoding will encode certain Unicode code points as a pair of 69eda3ef2dSRobert Mustacchitwo 16-bit code sequences, commonly referred to as a surrogate pair. 70eda3ef2dSRobert Mustacchi.It Fn c32rtomb 71eda3ef2dSRobert MustacchiA UTF-32 code sequence, where every code point is represented by a 72eda3ef2dSRobert Mustacchisingle 73eda3ef2dSRobert Mustacchi.Vt char32_t . 74eda3ef2dSRobert MustacchiIt is illegal to pass reserved Unicode code points. 75eda3ef2dSRobert Mustacchi.It Fn wcrtomb , Fn wcrtomb_l 76eda3ef2dSRobert MustacchiWide characters, being a 32-bit value where every code point is 77eda3ef2dSRobert Mustacchirepresented by a single 78eda3ef2dSRobert Mustacchi.Vt wchar_t . 79eda3ef2dSRobert MustacchiWhile the 80eda3ef2dSRobert Mustacchi.Vt wchar_t 81eda3ef2dSRobert Mustacchiand 82eda3ef2dSRobert Mustacchi.Vt char32_t 83eda3ef2dSRobert Mustacchiare different types, in this implementation, they are similar encodings. 84eda3ef2dSRobert Mustacchi.El 85eda3ef2dSRobert Mustacchi.Pp 86eda3ef2dSRobert MustacchiThe functions all work by looking at the passed in wide-character 87eda3ef2dSRobert Mustacchi.Po 88eda3ef2dSRobert Mustacchi.Fa c16 , 89eda3ef2dSRobert Mustacchi.Fa c32 , 90eda3ef2dSRobert Mustacchi.Fa wc 91eda3ef2dSRobert Mustacchi.Pc 92eda3ef2dSRobert Mustacchiand appending it to the current conversion state, 93eda3ef2dSRobert Mustacchi.Fa ps . 94eda3ef2dSRobert MustacchiOnce a valid code point, based on the current locale, is found, then it 95eda3ef2dSRobert Mustacchiwill be converted into a series of characters that are stored in 96eda3ef2dSRobert Mustacchi.Fa str . 97eda3ef2dSRobert MustacchiUp to 98eda3ef2dSRobert Mustacchi.Dv MB_CUR_MAX 99eda3ef2dSRobert Mustacchibytes will be stored in 100eda3ef2dSRobert Mustacchi.Fa str . 101eda3ef2dSRobert MustacchiIt is the caller's responsibility to ensure that there is sufficient 102eda3ef2dSRobert Mustacchispace in 103eda3ef2dSRobert Mustacchi.Fa str . 104eda3ef2dSRobert Mustacchi.Pp 105eda3ef2dSRobert MustacchiThe functions are all influenced by the 106eda3ef2dSRobert Mustacchi.Dv LC_CTYPE 107eda3ef2dSRobert Mustacchicategory of the current locale for determining what is considered a 108eda3ef2dSRobert Mustacchivalid character. 109eda3ef2dSRobert MustacchiFor example, in the 110eda3ef2dSRobert Mustacchi.Sy C 111eda3ef2dSRobert Mustacchilocale, 112eda3ef2dSRobert Mustacchionly ASCII characters are recognized, while in a 113eda3ef2dSRobert Mustacchi.Sy UTF-8 114eda3ef2dSRobert Mustacchibased locale like 115eda3ef2dSRobert Mustacchi.Sy en_us.UTF-8 , 116eda3ef2dSRobert Mustacchiall valid Unicode code points are recognized and will be converted into 117eda3ef2dSRobert Mustacchithe corresponding multi-byte sequence. 118eda3ef2dSRobert MustacchiThe 119eda3ef2dSRobert Mustacchi.Fn wcrtomb_l 120eda3ef2dSRobert Mustacchifunction uses the locale passed in 121eda3ef2dSRobert Mustacchi.Fa loc 122eda3ef2dSRobert Mustacchirather than the locale of the current thread. 123eda3ef2dSRobert Mustacchi.Pp 124eda3ef2dSRobert MustacchiThe 125eda3ef2dSRobert Mustacchi.Fa ps 126eda3ef2dSRobert Mustacchiargument represents a multi-byte conversion state which can be used 127eda3ef2dSRobert Mustacchiacross multiple calls to a given function 128eda3ef2dSRobert Mustacchi.Pq but not mixed between functions . 129eda3ef2dSRobert MustacchiThese allow for characters to be consumed from subsequent buffers, e.g. 130eda3ef2dSRobert Mustacchidifferent values of 131eda3ef2dSRobert Mustacchi.Fa str . 132eda3ef2dSRobert MustacchiThe functions may be called from multiple threads as long as they use 133eda3ef2dSRobert Mustacchiunique values for 134eda3ef2dSRobert Mustacchi.Fa ps . 135eda3ef2dSRobert MustacchiIf 136eda3ef2dSRobert Mustacchi.Fa ps 137eda3ef2dSRobert Mustacchiis 138eda3ef2dSRobert Mustacchi.Dv NULL , 139eda3ef2dSRobert Mustacchithen a function-specific buffer will be used for the conversion state; 140eda3ef2dSRobert Mustacchihowever, this is stored between all threads and its use is not 141eda3ef2dSRobert Mustacchirecommended. 142eda3ef2dSRobert Mustacchi.Pp 143eda3ef2dSRobert MustacchiThe functions all have a special behavior when 144eda3ef2dSRobert Mustacchi.Dv NULL 145eda3ef2dSRobert Mustacchiis passed for 146eda3ef2dSRobert Mustacchi.Fa str . 147eda3ef2dSRobert MustacchiThey instead will treat it as though a the NULL wide-character was 148eda3ef2dSRobert Mustacchipassed in 149eda3ef2dSRobert Mustacchi.Fa c16 , 150eda3ef2dSRobert Mustacchi.Fa c32 , 151eda3ef2dSRobert Mustacchior 152eda3ef2dSRobert Mustacchi.Fa wc 153eda3ef2dSRobert Mustacchiand an internal buffer 154eda3ef2dSRobert Mustacchi.Pq buf 155eda3ef2dSRobert Mustacchiwill be used to write out the results of the 156*4a8d6d7cSPeter Tribbleconversion. 157eda3ef2dSRobert MustacchiIn other words, the functions would be called as: 158eda3ef2dSRobert Mustacchi.Bd -literal -offset indent 159eda3ef2dSRobert Mustacchic16rtomb(buf, L'\\0', ps) 160eda3ef2dSRobert Mustacchic32rtomb(buf, L'\\0', ps) 161eda3ef2dSRobert Mustacchiwcrtomb(buf, L'\\0', ps) 162eda3ef2dSRobert Mustacchiwcrtomb_l(buf, L'\\0', ps, loc) 163eda3ef2dSRobert Mustacchi.Ed 164eda3ef2dSRobert Mustacchi.Ss Locale Details 165eda3ef2dSRobert MustacchiNot all locales in the system are Unicode based locales. 166eda3ef2dSRobert MustacchiFor example, ISO 8859 family locales have code points with values that 167eda3ef2dSRobert Mustacchido not match their counterparts in Unicode. 168eda3ef2dSRobert MustacchiWhen using these functions with non-Unicode based locales, the code 169eda3ef2dSRobert Mustacchipoints returned will be those determined by the locale. 170eda3ef2dSRobert MustacchiThey will not be converted from the corresponding Unicode code point. 171eda3ef2dSRobert MustacchiFor example, if using the Euro sign in ISO 8859-15, these functions 172eda3ef2dSRobert Mustacchiwill not encode the Unicode value 0x20ac into the ISO 8859-15 value 173eda3ef2dSRobert Mustacchi0xa4. 174eda3ef2dSRobert Mustacchi.Pp 175eda3ef2dSRobert MustacchiRegardless of the locale, the characters returned will be encoded as 176eda3ef2dSRobert Mustacchithough the code point were the corresponding value in Unicode. 177eda3ef2dSRobert MustacchiThis means that when using UTF-16, if the corresponding code point were 178*4a8d6d7cSPeter Tribblein the range for surrogate pairs, then the 179eda3ef2dSRobert Mustacchi.Fn c16rtomb 180eda3ef2dSRobert Mustacchifunction will expect to receive that code point in that fashion. 181eda3ef2dSRobert Mustacchi.Pp 182eda3ef2dSRobert MustacchiThis behavior of the 183eda3ef2dSRobert Mustacchi.Fn c16rtomb 184eda3ef2dSRobert Mustacchiand 185eda3ef2dSRobert Mustacchi.Fn c32rtomb 186eda3ef2dSRobert Mustacchifunctions should not be relied upon, is not portable, and subject to 187eda3ef2dSRobert Mustacchichange for non-Unicode locales. 188eda3ef2dSRobert Mustacchi.Sh RETURN VALUES 189eda3ef2dSRobert MustacchiUpon successful completion, the 190eda3ef2dSRobert Mustacchi.Fn c16rtomb , 191eda3ef2dSRobert Mustacchi.Fn c32rtomb , 192eda3ef2dSRobert Mustacchi.Fn wcrtomb , 193eda3ef2dSRobert Mustacchiand 194eda3ef2dSRobert Mustacchi.Fn wcrtomb_l 195eda3ef2dSRobert Mustacchifunctions return the number of bytes stored in 196eda3ef2dSRobert Mustacchi.Fa str . 197eda3ef2dSRobert MustacchiOtherwise, 198eda3ef2dSRobert Mustacchi.Sy (size_t)-1 199eda3ef2dSRobert Mustacchiis returned to indicate an encoding error and 200eda3ef2dSRobert Mustacchi.Va errno 201eda3ef2dSRobert Mustacchiis set. 202eda3ef2dSRobert Mustacchi.Sh EXAMPLES 203eda3ef2dSRobert Mustacchi.Sy Example 1 204eda3ef2dSRobert MustacchiConverting a UTF-32 character into a multi-byte character sequence. 205eda3ef2dSRobert Mustacchi.Bd -literal 206eda3ef2dSRobert Mustacchi#include <locale.h> 207eda3ef2dSRobert Mustacchi#include <stdlib.h> 208eda3ef2dSRobert Mustacchi#include <string.h> 209eda3ef2dSRobert Mustacchi#include <err.h> 210eda3ef2dSRobert Mustacchi#include <stdio.h> 211eda3ef2dSRobert Mustacchi#include <uchar.h> 212eda3ef2dSRobert Mustacchi 213eda3ef2dSRobert Mustacchiint 214eda3ef2dSRobert Mustacchimain(void) 215eda3ef2dSRobert Mustacchi{ 216eda3ef2dSRobert Mustacchi mbstate_t mbs; 217eda3ef2dSRobert Mustacchi size_t ret; 218eda3ef2dSRobert Mustacchi char buf[MB_CUR_MAX]; 219eda3ef2dSRobert Mustacchi char32_t val = 0x5149; 220eda3ef2dSRobert Mustacchi const char *uchar_exp = "\exe5\ex85\ex89"; 221eda3ef2dSRobert Mustacchi 222eda3ef2dSRobert Mustacchi (void) memset(&mbs, 0, sizeof (mbs)); 223eda3ef2dSRobert Mustacchi (void) setlocale(LC_CTYPE, "en_US.UTF-8"); 224eda3ef2dSRobert Mustacchi ret = c32rtomb(buf, val, &mbs); 225eda3ef2dSRobert Mustacchi if (ret != strlen(uchar_exp)) { 226eda3ef2dSRobert Mustacchi errx(EXIT_FAILURE, "failed to convert string, got %zd", 227eda3ef2dSRobert Mustacchi ret); 228eda3ef2dSRobert Mustacchi } 229eda3ef2dSRobert Mustacchi 230eda3ef2dSRobert Mustacchi if (strncmp(buf, uchar_exp, ret) != 0) { 231eda3ef2dSRobert Mustacchi errx(EXIT_FAILURE, "converted char32_t does not match " 232eda3ef2dSRobert Mustacchi "expected value"); 233eda3ef2dSRobert Mustacchi } 234eda3ef2dSRobert Mustacchi 235eda3ef2dSRobert Mustacchi return (0); 236eda3ef2dSRobert Mustacchi} 237eda3ef2dSRobert Mustacchi.Ed 238eda3ef2dSRobert Mustacchi.Sh ERRORS 239eda3ef2dSRobert MustacchiThe 240eda3ef2dSRobert Mustacchi.Fn c16rtomb , 241eda3ef2dSRobert Mustacchi.Fn c32rtomb , 242eda3ef2dSRobert Mustacchi.Fn wcrtomb , 243eda3ef2dSRobert Mustacchiand 244eda3ef2dSRobert Mustacchi.Fn wcrtomb_l 245eda3ef2dSRobert Mustacchifunctions will fail if: 246eda3ef2dSRobert Mustacchi.Bl -tag -width Er 247eda3ef2dSRobert Mustacchi.It Er EINVAL 248eda3ef2dSRobert MustacchiThe conversion state in 249eda3ef2dSRobert Mustacchi.Fa ps 250eda3ef2dSRobert Mustacchiis invalid. 251eda3ef2dSRobert Mustacchi.It Er EILSEQ 252eda3ef2dSRobert MustacchiAn invalid character sequence has been detected. 253eda3ef2dSRobert Mustacchi.El 254eda3ef2dSRobert Mustacchi.Sh MT-LEVEL 255eda3ef2dSRobert MustacchiThe 256eda3ef2dSRobert Mustacchi.Fn c16rtomb , 257eda3ef2dSRobert Mustacchi.Fn c32rtomb , 258eda3ef2dSRobert Mustacchi.Fn wcrtomb , 259eda3ef2dSRobert Mustacchiand 260eda3ef2dSRobert Mustacchi.Fn wcrtomb_l 261eda3ef2dSRobert Mustacchifunctions are 262eda3ef2dSRobert Mustacchi.Sy MT-Safe 263eda3ef2dSRobert Mustacchias long as different 264eda3ef2dSRobert Mustacchi.Vt mbstate_t 265eda3ef2dSRobert Mustacchistructures are passed in 266eda3ef2dSRobert Mustacchi.Fa ps . 267eda3ef2dSRobert MustacchiIf 268eda3ef2dSRobert Mustacchi.Fa ps 269eda3ef2dSRobert Mustacchiis 270eda3ef2dSRobert Mustacchi.Dv NULL 271eda3ef2dSRobert Mustacchior different threads use the same value for 272eda3ef2dSRobert Mustacchi.Fa ps , 273eda3ef2dSRobert Mustacchithen the functions are 274eda3ef2dSRobert Mustacchi.Sy Unsafe . 275eda3ef2dSRobert Mustacchi.Sh INTERFACE STABILITY 276eda3ef2dSRobert Mustacchi.Sy Committed 277eda3ef2dSRobert Mustacchi.Sh SEE ALSO 278eda3ef2dSRobert Mustacchi.Xr mbrtoc16 3C , 279eda3ef2dSRobert Mustacchi.Xr mbrtoc32 3C , 280eda3ef2dSRobert Mustacchi.Xr mbrtowc 3C , 281eda3ef2dSRobert Mustacchi.Xr newlocale 3C , 282eda3ef2dSRobert Mustacchi.Xr setlocale 3C , 283eda3ef2dSRobert Mustacchi.Xr uselocale 3C , 284eda3ef2dSRobert Mustacchi.Xr uchar.h 3HEAD , 285bbf21555SRichard Lowe.Xr environ 7 286