1.\" Copyright (c) 1993 2.\" The Regents of the University of California. All rights reserved. 3.\" 4.\" This code is derived from software contributed to Berkeley by 5.\" Donn Seeley of BSDI. 6.\" 7.\" Redistribution and use in source and binary forms, with or without 8.\" modification, are permitted provided that the following conditions 9.\" are met: 10.\" 1. Redistributions of source code must retain the above copyright 11.\" notice, this list of conditions and the following disclaimer. 12.\" 2. Redistributions in binary form must reproduce the above copyright 13.\" notice, this list of conditions and the following disclaimer in the 14.\" documentation and/or other materials provided with the distribution. 15.\" 3. All advertising materials mentioning features or use of this software 16.\" must display the following acknowledgement: 17.\" This product includes software developed by the University of 18.\" California, Berkeley and its contributors. 19.\" 4. Neither the name of the University nor the names of its contributors 20.\" may be used to endorse or promote products derived from this software 21.\" without specific prior written permission. 22.\" 23.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND 24.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 25.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 26.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE 27.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 28.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 29.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 30.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 31.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 32.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 33.\" SUCH DAMAGE. 34.\" 35.\" @(#)multibyte.3 8.1 (Berkeley) 6/4/93 36.\" $FreeBSD$ 37.\" 38.Dd June 4, 1993 39.Dt MULTIBYTE 3 40.Os 41.Sh NAME 42.Nm mblen , 43.Nm mbstowcs , 44.Nm mbtowc , 45.Nm wcstombs , 46.Nm wctomb 47.Nd multibyte character support for C 48.Sh LIBRARY 49.Lb libc 50.Sh SYNOPSIS 51.In stdlib.h 52.Ft int 53.Fn mblen "const char *mbchar" "size_t nbytes" 54.Ft size_t 55.Fn mbstowcs "wchar_t *wcstring" "const char *mbstring" "size_t nwchars" 56.Ft int 57.Fn mbtowc "wchar_t *wcharp" "const char *mbchar" "size_t nbytes" 58.Ft size_t 59.Fn wcstombs "char *mbstring" "const wchar_t *wcstring" "size_t nbytes" 60.Ft int 61.Fn wctomb "char *mbchar" "wchar_t wchar" 62.Sh DESCRIPTION 63The basic elements of some written natural languages such as Chinese 64cannot be represented uniquely with single C 65.Va char Ns s . 66The C standard supports two different ways of dealing with 67extended natural language encodings, 68.Em wide 69characters and 70.Em multibyte 71characters. 72Wide characters are an internal representation 73which allows each basic element to map 74to a single object of type 75.Va wchar_t . 76Multibyte characters are used for input and output 77and code each basic element as a sequence of C 78.Va char Ns s . 79Individual basic elements may map into one or more 80(up to 81.Dv MB_CHAR_MAX ) 82bytes in a multibyte character. 83.Pp 84The current locale 85.Pq Xr setlocale 3 86governs the interpretation of wide and multibyte characters. 87The locale category 88.Dv LC_CTYPE 89specifically controls this interpretation. 90The 91.Va wchar_t 92type is wide enough to hold the largest value 93in the wide character representations for all locales. 94.Pp 95Multibyte strings may contain 96.Sq shift 97indicators to switch to and from 98particular modes within the given representation. 99If explicit bytes are used to signal shifting, 100these are not recognized as separate characters 101but are lumped with a neighboring character. 102There is always a distinguished 103.Sq initial 104shift state. 105The 106.Fn mbstowcs 107and 108.Fn wcstombs 109functions assume that multibyte strings are interpreted 110starting from the initial shift state. 111The 112.Fn mblen , 113.Fn mbtowc 114and 115.Fn wctomb 116functions maintain static shift state internally. 117A call with a null 118.Fa mbchar 119pointer returns nonzero if the current locale requires shift states, 120zero otherwise; 121if shift states are required, the shift state is reset to the initial state. 122The internal shift states are undefined after a call to 123.Fn setlocale 124with the 125.Dv LC_CTYPE 126or 127.Dv LC_ALL 128categories. 129.Pp 130For convenience in processing, 131the wide character with value 0 132(the null wide character) 133is recognized as the wide character string terminator, 134and the character with value 0 135(the null byte) 136is recognized as the multibyte character string terminator. 137Null bytes are not permitted within multibyte characters. 138.Pp 139The 140.Fn mblen 141function computes the length in bytes 142of a multibyte character 143.Fa mbchar . 144Up to 145.Fa nbytes 146bytes are examined. 147.Pp 148The 149.Fn mbtowc 150function converts a multibyte character 151.Fa mbchar 152into a wide character and stores the result 153in the object pointed to by 154.Fa wcharp . 155Up to 156.Fa nbytes 157bytes are examined. 158.Pp 159The 160.Fn wctomb 161function converts a wide character 162.Fa wchar 163into a multibyte character and stores 164the result in 165.Fa mbchar . 166The object pointed to by 167.Fa mbchar 168must be large enough to accommodate the multibyte character. 169.Pp 170The 171.Fn mbstowcs 172function converts a multibyte character string 173.Fa mbstring 174into a wide character string 175.Fa wcstring . 176No more than 177.Fa nwchars 178wide characters are stored. 179A terminating null wide character is appended if there is room. 180.Pp 181The 182.Fn wcstombs 183function converts a wide character string 184.Fa wcstring 185into a multibyte character string 186.Fa mbstring . 187Up to 188.Fa nbytes 189bytes are stored in 190.Fa mbstring . 191Partial multibyte characters at the end of the string are not stored. 192The multibyte character string is null terminated if there is room. 193.Sh "RETURN VALUES 194If multibyte characters are not supported in the current locale, 195all of these functions will return \-1 if characters can be processed, 196otherwise 0. 197.Pp 198If 199.Fa mbchar 200is 201.Dv NULL , 202the 203.Fn mblen , 204.Fn mbtowc 205and 206.Fn wctomb 207functions return nonzero if shift states are supported, 208zero otherwise. 209If 210.Fa mbchar 211is valid, 212then these functions return 213the number of bytes processed in 214.Fa mbchar , 215or \-1 if no multibyte character 216could be recognized or converted. 217.Pp 218The 219.Fn mbstowcs 220function returns the number of wide characters converted, 221not counting any terminating null wide character. 222The 223.Fn wcstombs 224function returns the number of bytes converted, 225not counting any terminating null byte. 226If any invalid multibyte characters are encountered, 227both functions return \-1. 228.Sh "SEE ALSO 229.Xr mbrune 3 , 230.Xr rune 3 , 231.Xr setlocale 3 , 232.Xr euc 4 , 233.Xr utf2 4 234.Sh STANDARDS 235The 236.Fn mblen , 237.Fn mbstowcs , 238.Fn mbtowc , 239.Fn wcstombs 240and 241.Fn wctomb 242functions conform to 243.St -isoC . 244.Sh BUGS 245The current implementation does not support shift states. 246