1.\" Copyright (c) 2002-2004 Tim J. Robbins. All rights reserved. 2.\" Copyright (c) 1993 3.\" The Regents of the University of California. All rights reserved. 4.\" 5.\" This code is derived from software contributed to Berkeley by 6.\" Donn Seeley of BSDI. 7.\" 8.\" Redistribution and use in source and binary forms, with or without 9.\" modification, are permitted provided that the following conditions 10.\" are met: 11.\" 1. Redistributions of source code must retain the above copyright 12.\" notice, this list of conditions and the following disclaimer. 13.\" 2. Redistributions in binary form must reproduce the above copyright 14.\" notice, this list of conditions and the following disclaimer in the 15.\" documentation and/or other materials provided with the distribution. 16.\" 3. All advertising materials mentioning features or use of this software 17.\" must display the following acknowledgement: 18.\" This product includes software developed by the University of 19.\" California, Berkeley and its contributors. 20.\" 4. Neither the name of the University nor the names of its contributors 21.\" may be used to endorse or promote products derived from this software 22.\" without specific prior written permission. 23.\" 24.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND 25.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 26.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 27.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE 28.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 29.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 30.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 31.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 32.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 33.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 34.\" SUCH DAMAGE. 35.\" 36.\" @(#)multibyte.3 8.1 (Berkeley) 6/4/93 37.\" $FreeBSD$ 38.\" 39.Dd April 8, 2004 40.Dt MULTIBYTE 3 41.Os 42.Sh NAME 43.Nm multibyte 44.Nd multibyte and wide character manipulation functions 45.Sh LIBRARY 46.Lb libc 47.Sh SYNOPSIS 48.In limits.h 49.In stdlib.h 50.In wchar.h 51.Sh DESCRIPTION 52The basic elements of some written natural languages, such as Chinese, 53cannot be represented uniquely with single C 54.Va char Ns s . 55The C standard supports two different ways of dealing with 56extended natural language encodings: 57wide characters and 58multibyte characters. 59Wide characters are an internal representation 60which allows each basic element to map 61to a single object of type 62.Va wchar_t . 63Multibyte characters are used for input and output 64and code each basic element as a sequence of C 65.Va char Ns s . 66Individual basic elements may map into one or more 67(up to 68.Dv MB_LEN_MAX ) 69bytes in a multibyte character. 70.Pp 71The current locale 72.Pq Xr setlocale 3 73governs the interpretation of wide and multibyte characters. 74The locale category 75.Dv LC_CTYPE 76specifically controls this interpretation. 77The 78.Va wchar_t 79type is wide enough to hold the largest value 80in the wide character representations for all locales. 81.Pp 82Multibyte strings may contain 83.Sq shift 84indicators to switch to and from 85particular modes within the given representation. 86If explicit bytes are used to signal shifting, 87these are not recognized as separate characters 88but are lumped with a neighboring character. 89There is always a distinguished 90.Sq initial 91shift state. 92Some functions (e.g. 93.Fn mblen , 94.Fn mbtowc 95and 96.Fn wctomb ) 97maintain static shift state internally, whereas 98others store in an 99.Vt mbstate_t 100object passed by the caller. 101Shift states are undefined after a call to 102.Fn setlocale 103with the 104.Dv LC_CTYPE 105or 106.Dv LC_ALL 107categories. 108.Pp 109For convenience in processing, 110the wide character with value 0 111(the null wide character) 112is recognized as the wide character string terminator, 113and the character with value 0 114(the null byte) 115is recognized as the multibyte character string terminator. 116Null bytes are not permitted within multibyte characters. 117.Pp 118The C library provides the following functions for dealing with 119multibyte characters: 120.Bl -column "Description" 121.It Sy "Function Description" 122.It "mblen get number of bytes in a character" 123.It "mbrlen get number of bytes in a character (restartable)" 124.It "mbrtowc convert a character to a wide-character code (restartable)" 125.It "mbsrtowcs convert a character string to a wide-character string (restartable)" 126.It "mbstowcs convert a character string to a wide-character string" 127.It "mbtowc convert a character to a wide-character code" 128.It "wcrtomb convert a wide-character code to a character (restartable)" 129.It "wcstombs convert a wide-character string to a character string" 130.It "wcsrtombs convert a wide-character string to a character string (restartable)" 131.It "wctomb convert a wide-character code to a character" 132.El 133.Sh SEE ALSO 134.Xr mklocale 1 , 135.Xr stdio 3 , 136.Xr setlocale 3 , 137.Xr big5 5 , 138.Xr euc 5 , 139.Xr gb18030 5 , 140.Xr gb2312 5 , 141.Xr gbk 5 , 142.Xr mskanji 5 , 143.Xr utf2 5 , 144.Xr utf8 5 145.Sh STANDARDS 146These functions conform to 147.St -isoC-99 . 148