xref: /freebsd/lib/libc/locale/multibyte.3 (revision 41466b50c1d5bfd1cf6adaae547a579a75d7c04e)
1.\" Copyright (c) 1993
2.\"	The Regents of the University of California.  All rights reserved.
3.\"
4.\" This code is derived from software contributed to Berkeley by
5.\" Donn Seeley of BSDI.
6.\"
7.\" Redistribution and use in source and binary forms, with or without
8.\" modification, are permitted provided that the following conditions
9.\" are met:
10.\" 1. Redistributions of source code must retain the above copyright
11.\"    notice, this list of conditions and the following disclaimer.
12.\" 2. Redistributions in binary form must reproduce the above copyright
13.\"    notice, this list of conditions and the following disclaimer in the
14.\"    documentation and/or other materials provided with the distribution.
15.\" 3. All advertising materials mentioning features or use of this software
16.\"    must display the following acknowledgement:
17.\"	This product includes software developed by the University of
18.\"	California, Berkeley and its contributors.
19.\" 4. Neither the name of the University nor the names of its contributors
20.\"    may be used to endorse or promote products derived from this software
21.\"    without specific prior written permission.
22.\"
23.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
24.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
25.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
26.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
27.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
28.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
29.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
30.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
31.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
32.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
33.\" SUCH DAMAGE.
34.\"
35.\"	@(#)multibyte.3	8.1 (Berkeley) 6/4/93
36.\" $FreeBSD$
37.\"
38.Dd June 4, 1993
39.Dt MULTIBYTE 3
40.Os
41.Sh NAME
42.Nm mblen ,
43.Nm mbstowcs ,
44.Nm mbtowc ,
45.Nm wcstombs ,
46.Nm wctomb
47.Nd multibyte character support for C
48.Sh LIBRARY
49.Lb libc
50.Sh SYNOPSIS
51.In stdlib.h
52.Ft int
53.Fn mblen "const char *mbchar" "size_t nbytes"
54.Ft size_t
55.Fn mbstowcs "wchar_t *wcstring" "const char *mbstring" "size_t nwchars"
56.Ft int
57.Fn mbtowc "wchar_t *wcharp" "const char *mbchar" "size_t nbytes"
58.Ft size_t
59.Fn wcstombs "char *mbstring" "const wchar_t *wcstring" "size_t nbytes"
60.Ft int
61.Fn wctomb "char *mbchar" "wchar_t wchar"
62.Sh DESCRIPTION
63The basic elements of some written natural languages such as Chinese
64cannot be represented uniquely with single C
65.Va char Ns s .
66The C standard supports two different ways of dealing with
67extended natural language encodings,
68.Em wide
69characters and
70.Em multibyte
71characters.
72Wide characters are an internal representation
73which allows each basic element to map
74to a single object of type
75.Va wchar_t .
76Multibyte characters are used for input and output
77and code each basic element as a sequence of C
78.Va char Ns s .
79Individual basic elements may map into one or more
80(up to
81.Dv MB_CHAR_MAX )
82bytes in a multibyte character.
83.Pp
84The current locale
85.Pq Xr setlocale 3
86governs the interpretation of wide and multibyte characters.
87The locale category
88.Dv LC_CTYPE
89specifically controls this interpretation.
90The
91.Va wchar_t
92type is wide enough to hold the largest value
93in the wide character representations for all locales.
94.Pp
95Multibyte strings may contain
96.Sq shift
97indicators to switch to and from
98particular modes within the given representation.
99If explicit bytes are used to signal shifting,
100these are not recognized as separate characters
101but are lumped with a neighboring character.
102There is always a distinguished
103.Sq initial
104shift state.
105The
106.Fn mbstowcs
107and
108.Fn wcstombs
109functions assume that multibyte strings are interpreted
110starting from the initial shift state.
111The
112.Fn mblen ,
113.Fn mbtowc
114and
115.Fn wctomb
116functions maintain static shift state internally.
117A call with a null
118.Fa mbchar
119pointer returns nonzero if the current locale requires shift states,
120zero otherwise;
121if shift states are required, the shift state is reset to the initial state.
122The internal shift states are undefined after a call to
123.Fn setlocale
124with the
125.Dv LC_CTYPE
126or
127.Dv LC_ALL
128categories.
129.Pp
130For convenience in processing,
131the wide character with value 0
132(the null wide character)
133is recognized as the wide character string terminator,
134and the character with value 0
135(the null byte)
136is recognized as the multibyte character string terminator.
137Null bytes are not permitted within multibyte characters.
138.Pp
139The
140.Fn mblen
141function computes the length in bytes
142of a multibyte character
143.Fa mbchar .
144Up to
145.Fa nbytes
146bytes are examined.
147.Pp
148The
149.Fn mbtowc
150function converts a multibyte character
151.Fa mbchar
152into a wide character and stores the result
153in the object pointed to by
154.Fa wcharp .
155Up to
156.Fa nbytes
157bytes are examined.
158.Pp
159The
160.Fn wctomb
161function converts a wide character
162.Fa wchar
163into a multibyte character and stores
164the result in
165.Fa mbchar .
166The object pointed to by
167.Fa mbchar
168must be large enough to accommodate the multibyte character.
169.Pp
170The
171.Fn mbstowcs
172function converts a multibyte character string
173.Fa mbstring
174into a wide character string
175.Fa wcstring .
176No more than
177.Fa nwchars
178wide characters are stored.
179A terminating null wide character is appended if there is room.
180.Pp
181The
182.Fn wcstombs
183function converts a wide character string
184.Fa wcstring
185into a multibyte character string
186.Fa mbstring .
187Up to
188.Fa nbytes
189bytes are stored in
190.Fa mbstring .
191Partial multibyte characters at the end of the string are not stored.
192The multibyte character string is null terminated if there is room.
193.Sh "RETURN VALUES
194If multibyte characters are not supported in the current locale,
195all of these functions will return \-1 if characters can be processed,
196otherwise 0.
197.Pp
198If
199.Fa mbchar
200is
201.Dv NULL ,
202the
203.Fn mblen ,
204.Fn mbtowc
205and
206.Fn wctomb
207functions return nonzero if shift states are supported,
208zero otherwise.
209If
210.Fa mbchar
211is valid,
212then these functions return
213the number of bytes processed in
214.Fa mbchar ,
215or \-1 if no multibyte character
216could be recognized or converted.
217.Pp
218The
219.Fn mbstowcs
220function returns the number of wide characters converted,
221not counting any terminating null wide character.
222The
223.Fn wcstombs
224function returns the number of bytes converted,
225not counting any terminating null byte.
226If any invalid multibyte characters are encountered,
227both functions return \-1.
228.Sh "SEE ALSO
229.Xr mbrune 3 ,
230.Xr rune 3 ,
231.Xr setlocale 3 ,
232.Xr euc 4 ,
233.Xr utf2 4
234.Sh STANDARDS
235The
236.Fn mblen ,
237.Fn mbstowcs ,
238.Fn mbtowc ,
239.Fn wcstombs
240and
241.Fn wctomb
242functions conform to
243.St -isoC .
244.Sh BUGS
245The current implementation does not support shift states.
246