1.\" Copyright (c) 1993 2.\" The Regents of the University of California. All rights reserved. 3.\" 4.\" This code is derived from software contributed to Berkeley by 5.\" Donn Seeley of BSDI. 6.\" 7.\" Redistribution and use in source and binary forms, with or without 8.\" modification, are permitted provided that the following conditions 9.\" are met: 10.\" 1. Redistributions of source code must retain the above copyright 11.\" notice, this list of conditions and the following disclaimer. 12.\" 2. Redistributions in binary form must reproduce the above copyright 13.\" notice, this list of conditions and the following disclaimer in the 14.\" documentation and/or other materials provided with the distribution. 15.\" 3. All advertising materials mentioning features or use of this software 16.\" must display the following acknowledgement: 17.\" This product includes software developed by the University of 18.\" California, Berkeley and its contributors. 19.\" 4. Neither the name of the University nor the names of its contributors 20.\" may be used to endorse or promote products derived from this software 21.\" without specific prior written permission. 22.\" 23.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND 24.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 25.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 26.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE 27.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 28.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 29.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 30.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 31.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 32.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 33.\" SUCH DAMAGE. 34.\" 35.\" @(#)multibyte.3 8.1 (Berkeley) 6/4/93 36.\" 37.Dd "June 4, 1993" 38.Dt MULTIBYTE 3 39.Os 40.Sh NAME 41.Nm mblen , 42.Nm mbstowcs , 43.Nm mbtowc , 44.Nm wcstombs , 45.Nm wctomb 46.Nd multibyte character support for C 47.Sh SYNOPSIS 48.Fd #include <stdlib.h> 49.Ft int 50.Fn mblen "const char *mbchar" "int nbytes" 51.Ft size_t 52.Fn mbstowcs "wchar_t *wcstring" "const char *mbstring" "size_t nwchars" 53.Ft int 54.Fn mbtowc "wchar_t *wcharp" "const char *mbchar" "size_t nbytes" 55.Ft size_t 56.Fn wcstombs "char *mbstring" "const wchar_t *wcstring" "size_t nbytes" 57.Ft int 58.Fn wctomb "char *mbchar" "wchar_t wchar" 59.Sh DESCRIPTION 60The basic elements of some written natural languages such as Chinese 61cannot be represented uniquely with single C 62.Va char Ns s . 63The C standard supports two different ways of dealing with 64extended natural language encodings, 65.Em wide 66characters and 67.Em multibyte 68characters. 69Wide characters are an internal representation 70which allows each basic element to map 71to a single object of type 72.Va wchar_t . 73Multibyte characters are used for input and output 74and code each basic element as a sequence of C 75.Va char Ns s . 76Individual basic elements may map into one or more 77.Pq up to Dv MB_CHAR_MAX 78bytes in a multibyte character. 79.Pp 80The current locale 81.Pq Xr setlocale 3 82governs the interpretation of wide and multibyte characters. 83The locale category 84.Dv LC_CTYPE 85specifically controls this interpretation. 86The 87.Va wchar_t 88type is wide enough to hold the largest value 89in the wide character representations for all locales. 90.Pp 91Multibyte strings may contain 92.Sq shift 93indicators to switch to and from 94particular modes within the given representation. 95If explicit bytes are used to signal shifting, 96these are not recognized as separate characters 97but are lumped with a neighboring character. 98There is always a distinguished 99.Sq initial 100shift state. 101The 102.Fn mbstowcs 103and 104.Fn wcstombs 105functions assume that multibyte strings are interpreted 106starting from the initial shift state. 107The 108.Fn mblen , 109.Fn mbtowc 110and 111.Fn wctomb 112functions maintain static shift state internally. 113A call with a null 114.Fa mbchar 115pointer returns nonzero if the current locale requires shift states, 116zero otherwise; 117if shift states are required, the shift state is reset to the initial state. 118The internal shift states are undefined after a call to 119.Fn setlocale 120with the 121.Dv LC_CTYPE 122or 123.Dv LC_ALL 124categories. 125.Pp 126For convenience in processing, 127the wide character with value 0 128.Pq the null wide character 129is recognized as the wide character string terminator, 130and the character with value 0 131.Pq the null byte 132is recognized as the multibyte character string terminator. 133Null bytes are not permitted within multibyte characters. 134.Pp 135The 136.Fn mblen 137function computes the length in bytes 138of a multibyte character 139.Fa mbchar . 140Up to 141.Fa nbytes 142bytes are examined. 143.Pp 144The 145.Fn mbtowc 146function converts a multibyte character 147.Fa mbchar 148into a wide character and stores the result 149in the object pointed to by 150.Fa wcharp. 151Up to 152.Fa nbytes 153bytes are examined. 154.Pp 155The 156.Fn wctomb 157function converts a wide character 158.Fa wchar 159into a multibyte character and stores 160the result in 161.Fa mbchar . 162The object pointed to by 163.Fa mbchar 164must be large enough to accommodate the multibyte character. 165.Pp 166The 167.Fn mbstowcs 168function converts a multibyte character string 169.Fa mbstring 170into a wide character string 171.Fa wcstring . 172No more than 173.Fa nwchars 174wide characters are stored. 175A terminating null wide character is appended if there is room. 176.Pp 177The 178.Fn wcstombs 179function converts a wide character string 180.Fa wcstring 181into a multibyte character string 182.Fa mbstring . 183Up to 184.Fa nbytes 185bytes are stored in 186.Fa mbstring . 187Partial multibyte characters at the end of the string are not stored. 188The multibyte character string is null terminated if there is room. 189.Sh "RETURN VALUES 190If multibyte characters are not supported in the current locale, 191all of these functions will return \-1 if characters can be processed, 192otherwise 0. 193.Pp 194If 195.Fa mbchar 196is 197.Dv NULL , 198the 199.Fn mblen , 200.Fn mbtowc 201and 202.Fn wctomb 203functions return nonzero if shift states are supported, 204zero otherwise. 205If 206.Fa mbchar 207is valid, 208then these functions return 209the number of bytes processed in 210.Fa mbchar , 211or \-1 if no multibyte character 212could be recognized or converted. 213.Pp 214The 215.Fn mbstowcs 216function returns the number of wide characters converted, 217not counting any terminating null wide character. 218The 219.Fn wcstombs 220function returns the number of bytes converted, 221not counting any terminating null byte. 222If any invalid multibyte characters are encountered, 223both functions return \-1. 224.Sh "SEE ALSO 225.Xr euc 4 , 226.Xr mbrune 3 , 227.Xr rune 3 , 228.Xr setlocale 3 , 229.Xr utf2 4 230.Sh STANDARDS 231The 232.Fn mblen , 233.Fn mbstowcs , 234.Fn mbtowc , 235.Fn wcstombs 236and 237.Fn wctomb 238functions conform to 239.St -ansiC . 240.Sh BUGS 241The current implementation does not support shift states. 242