1.\" Copyright (c) 1993 2.\" The Regents of the University of California. All rights reserved. 3.\" 4.\" This code is derived from software contributed to Berkeley by 5.\" Donn Seeley of BSDI. 6.\" 7.\" Redistribution and use in source and binary forms, with or without 8.\" modification, are permitted provided that the following conditions 9.\" are met: 10.\" 1. Redistributions of source code must retain the above copyright 11.\" notice, this list of conditions and the following disclaimer. 12.\" 2. Redistributions in binary form must reproduce the above copyright 13.\" notice, this list of conditions and the following disclaimer in the 14.\" documentation and/or other materials provided with the distribution. 15.\" 3. All advertising materials mentioning features or use of this software 16.\" must display the following acknowledgement: 17.\" This product includes software developed by the University of 18.\" California, Berkeley and its contributors. 19.\" 4. Neither the name of the University nor the names of its contributors 20.\" may be used to endorse or promote products derived from this software 21.\" without specific prior written permission. 22.\" 23.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND 24.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 25.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 26.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE 27.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 28.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 29.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 30.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 31.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 32.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 33.\" SUCH DAMAGE. 34.\" 35.\" @(#)multibyte.3 8.1 (Berkeley) 6/4/93 36.\" $FreeBSD$ 37.\" 38.Dd June 4, 1993 39.Dt MULTIBYTE 3 40.Os 41.Sh NAME 42.Nm mblen , 43.Nm mbstowcs , 44.Nm mbtowc , 45.Nm wcstombs , 46.Nm wctomb 47.Nd multibyte character support for C 48.Sh LIBRARY 49.Lb libc 50.Sh SYNOPSIS 51.Fd #include <stdlib.h> 52.Ft int 53.Fn mblen "const char *mbchar" "size_t nbytes" 54.Ft size_t 55.Fn mbstowcs "wchar_t *wcstring" "const char *mbstring" "size_t nwchars" 56.Ft int 57.Fn mbtowc "wchar_t *wcharp" "const char *mbchar" "size_t nbytes" 58.Ft size_t 59.Fn wcstombs "char *mbstring" "const wchar_t *wcstring" "size_t nbytes" 60.Ft int 61.Fn wctomb "char *mbchar" "wchar_t wchar" 62.Sh DESCRIPTION 63The basic elements of some written natural languages such as Chinese 64cannot be represented uniquely with single C 65.Va char Ns s . 66The C standard supports two different ways of dealing with 67extended natural language encodings, 68.Em wide 69characters and 70.Em multibyte 71characters. 72Wide characters are an internal representation 73which allows each basic element to map 74to a single object of type 75.Va wchar_t . 76Multibyte characters are used for input and output 77and code each basic element as a sequence of C 78.Va char Ns s . 79Individual basic elements may map into one or more 80.Pq up to Dv MB_CHAR_MAX 81bytes in a multibyte character. 82.Pp 83The current locale 84.Pq Xr setlocale 3 85governs the interpretation of wide and multibyte characters. 86The locale category 87.Dv LC_CTYPE 88specifically controls this interpretation. 89The 90.Va wchar_t 91type is wide enough to hold the largest value 92in the wide character representations for all locales. 93.Pp 94Multibyte strings may contain 95.Sq shift 96indicators to switch to and from 97particular modes within the given representation. 98If explicit bytes are used to signal shifting, 99these are not recognized as separate characters 100but are lumped with a neighboring character. 101There is always a distinguished 102.Sq initial 103shift state. 104The 105.Fn mbstowcs 106and 107.Fn wcstombs 108functions assume that multibyte strings are interpreted 109starting from the initial shift state. 110The 111.Fn mblen , 112.Fn mbtowc 113and 114.Fn wctomb 115functions maintain static shift state internally. 116A call with a null 117.Fa mbchar 118pointer returns nonzero if the current locale requires shift states, 119zero otherwise; 120if shift states are required, the shift state is reset to the initial state. 121The internal shift states are undefined after a call to 122.Fn setlocale 123with the 124.Dv LC_CTYPE 125or 126.Dv LC_ALL 127categories. 128.Pp 129For convenience in processing, 130the wide character with value 0 131.Pq the null wide character 132is recognized as the wide character string terminator, 133and the character with value 0 134.Pq the null byte 135is recognized as the multibyte character string terminator. 136Null bytes are not permitted within multibyte characters. 137.Pp 138The 139.Fn mblen 140function computes the length in bytes 141of a multibyte character 142.Fa mbchar . 143Up to 144.Fa nbytes 145bytes are examined. 146.Pp 147The 148.Fn mbtowc 149function converts a multibyte character 150.Fa mbchar 151into a wide character and stores the result 152in the object pointed to by 153.Fa wcharp. 154Up to 155.Fa nbytes 156bytes are examined. 157.Pp 158The 159.Fn wctomb 160function converts a wide character 161.Fa wchar 162into a multibyte character and stores 163the result in 164.Fa mbchar . 165The object pointed to by 166.Fa mbchar 167must be large enough to accommodate the multibyte character. 168.Pp 169The 170.Fn mbstowcs 171function converts a multibyte character string 172.Fa mbstring 173into a wide character string 174.Fa wcstring . 175No more than 176.Fa nwchars 177wide characters are stored. 178A terminating null wide character is appended if there is room. 179.Pp 180The 181.Fn wcstombs 182function converts a wide character string 183.Fa wcstring 184into a multibyte character string 185.Fa mbstring . 186Up to 187.Fa nbytes 188bytes are stored in 189.Fa mbstring . 190Partial multibyte characters at the end of the string are not stored. 191The multibyte character string is null terminated if there is room. 192.Sh "RETURN VALUES 193If multibyte characters are not supported in the current locale, 194all of these functions will return \-1 if characters can be processed, 195otherwise 0. 196.Pp 197If 198.Fa mbchar 199is 200.Dv NULL , 201the 202.Fn mblen , 203.Fn mbtowc 204and 205.Fn wctomb 206functions return nonzero if shift states are supported, 207zero otherwise. 208If 209.Fa mbchar 210is valid, 211then these functions return 212the number of bytes processed in 213.Fa mbchar , 214or \-1 if no multibyte character 215could be recognized or converted. 216.Pp 217The 218.Fn mbstowcs 219function returns the number of wide characters converted, 220not counting any terminating null wide character. 221The 222.Fn wcstombs 223function returns the number of bytes converted, 224not counting any terminating null byte. 225If any invalid multibyte characters are encountered, 226both functions return \-1. 227.Sh "SEE ALSO 228.Xr mbrune 3 , 229.Xr rune 3 , 230.Xr setlocale 3 , 231.Xr euc 4 , 232.Xr utf2 4 233.Sh STANDARDS 234The 235.Fn mblen , 236.Fn mbstowcs , 237.Fn mbtowc , 238.Fn wcstombs 239and 240.Fn wctomb 241functions conform to 242.St -ansiC . 243.Sh BUGS 244The current implementation does not support shift states. 245