xref: /freebsd/lib/libc/locale/multibyte.3 (revision 23f282aa31e9b6fceacd449020e936e98d6f2298)
1.\" Copyright (c) 1993
2.\"	The Regents of the University of California.  All rights reserved.
3.\"
4.\" This code is derived from software contributed to Berkeley by
5.\" Donn Seeley of BSDI.
6.\"
7.\" Redistribution and use in source and binary forms, with or without
8.\" modification, are permitted provided that the following conditions
9.\" are met:
10.\" 1. Redistributions of source code must retain the above copyright
11.\"    notice, this list of conditions and the following disclaimer.
12.\" 2. Redistributions in binary form must reproduce the above copyright
13.\"    notice, this list of conditions and the following disclaimer in the
14.\"    documentation and/or other materials provided with the distribution.
15.\" 3. All advertising materials mentioning features or use of this software
16.\"    must display the following acknowledgement:
17.\"	This product includes software developed by the University of
18.\"	California, Berkeley and its contributors.
19.\" 4. Neither the name of the University nor the names of its contributors
20.\"    may be used to endorse or promote products derived from this software
21.\"    without specific prior written permission.
22.\"
23.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
24.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
25.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
26.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
27.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
28.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
29.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
30.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
31.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
32.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
33.\" SUCH DAMAGE.
34.\"
35.\"	@(#)multibyte.3	8.1 (Berkeley) 6/4/93
36.\" $FreeBSD$
37.\"
38.Dd June 4, 1993
39.Dt MULTIBYTE 3
40.Os
41.Sh NAME
42.Nm mblen ,
43.Nm mbstowcs ,
44.Nm mbtowc ,
45.Nm wcstombs ,
46.Nm wctomb
47.Nd multibyte character support for C
48.Sh LIBRARY
49.Lb libc
50.Sh SYNOPSIS
51.Fd #include <stdlib.h>
52.Ft int
53.Fn mblen "const char *mbchar" "size_t nbytes"
54.Ft size_t
55.Fn mbstowcs "wchar_t *wcstring" "const char *mbstring" "size_t nwchars"
56.Ft int
57.Fn mbtowc "wchar_t *wcharp" "const char *mbchar" "size_t nbytes"
58.Ft size_t
59.Fn wcstombs "char *mbstring" "const wchar_t *wcstring" "size_t nbytes"
60.Ft int
61.Fn wctomb "char *mbchar" "wchar_t wchar"
62.Sh DESCRIPTION
63The basic elements of some written natural languages such as Chinese
64cannot be represented uniquely with single C
65.Va char Ns s .
66The C standard supports two different ways of dealing with
67extended natural language encodings,
68.Em wide
69characters and
70.Em multibyte
71characters.
72Wide characters are an internal representation
73which allows each basic element to map
74to a single object of type
75.Va wchar_t .
76Multibyte characters are used for input and output
77and code each basic element as a sequence of C
78.Va char Ns s .
79Individual basic elements may map into one or more
80.Pq up to Dv MB_CHAR_MAX
81bytes in a multibyte character.
82.Pp
83The current locale
84.Pq Xr setlocale 3
85governs the interpretation of wide and multibyte characters.
86The locale category
87.Dv LC_CTYPE
88specifically controls this interpretation.
89The
90.Va wchar_t
91type is wide enough to hold the largest value
92in the wide character representations for all locales.
93.Pp
94Multibyte strings may contain
95.Sq shift
96indicators to switch to and from
97particular modes within the given representation.
98If explicit bytes are used to signal shifting,
99these are not recognized as separate characters
100but are lumped with a neighboring character.
101There is always a distinguished
102.Sq initial
103shift state.
104The
105.Fn mbstowcs
106and
107.Fn wcstombs
108functions assume that multibyte strings are interpreted
109starting from the initial shift state.
110The
111.Fn mblen ,
112.Fn mbtowc
113and
114.Fn wctomb
115functions maintain static shift state internally.
116A call with a null
117.Fa mbchar
118pointer returns nonzero if the current locale requires shift states,
119zero otherwise;
120if shift states are required, the shift state is reset to the initial state.
121The internal shift states are undefined after a call to
122.Fn setlocale
123with the
124.Dv LC_CTYPE
125or
126.Dv LC_ALL
127categories.
128.Pp
129For convenience in processing,
130the wide character with value 0
131.Pq the null wide character
132is recognized as the wide character string terminator,
133and the character with value 0
134.Pq the null byte
135is recognized as the multibyte character string terminator.
136Null bytes are not permitted within multibyte characters.
137.Pp
138The
139.Fn mblen
140function computes the length in bytes
141of a multibyte character
142.Fa mbchar .
143Up to
144.Fa nbytes
145bytes are examined.
146.Pp
147The
148.Fn mbtowc
149function converts a multibyte character
150.Fa mbchar
151into a wide character and stores the result
152in the object pointed to by
153.Fa wcharp.
154Up to
155.Fa nbytes
156bytes are examined.
157.Pp
158The
159.Fn wctomb
160function converts a wide character
161.Fa wchar
162into a multibyte character and stores
163the result in
164.Fa mbchar .
165The object pointed to by
166.Fa mbchar
167must be large enough to accommodate the multibyte character.
168.Pp
169The
170.Fn mbstowcs
171function converts a multibyte character string
172.Fa mbstring
173into a wide character string
174.Fa wcstring .
175No more than
176.Fa nwchars
177wide characters are stored.
178A terminating null wide character is appended if there is room.
179.Pp
180The
181.Fn wcstombs
182function converts a wide character string
183.Fa wcstring
184into a multibyte character string
185.Fa mbstring .
186Up to
187.Fa nbytes
188bytes are stored in
189.Fa mbstring .
190Partial multibyte characters at the end of the string are not stored.
191The multibyte character string is null terminated if there is room.
192.Sh "RETURN VALUES
193If multibyte characters are not supported in the current locale,
194all of these functions will return \-1 if characters can be processed,
195otherwise 0.
196.Pp
197If
198.Fa mbchar
199is
200.Dv NULL ,
201the
202.Fn mblen ,
203.Fn mbtowc
204and
205.Fn wctomb
206functions return nonzero if shift states are supported,
207zero otherwise.
208If
209.Fa mbchar
210is valid,
211then these functions return
212the number of bytes processed in
213.Fa mbchar ,
214or \-1 if no multibyte character
215could be recognized or converted.
216.Pp
217The
218.Fn mbstowcs
219function returns the number of wide characters converted,
220not counting any terminating null wide character.
221The
222.Fn wcstombs
223function returns the number of bytes converted,
224not counting any terminating null byte.
225If any invalid multibyte characters are encountered,
226both functions return \-1.
227.Sh "SEE ALSO
228.Xr mbrune 3 ,
229.Xr rune 3 ,
230.Xr setlocale 3 ,
231.Xr euc 4 ,
232.Xr utf2 4
233.Sh STANDARDS
234The
235.Fn mblen ,
236.Fn mbstowcs ,
237.Fn mbtowc ,
238.Fn wcstombs
239and
240.Fn wctomb
241functions conform to
242.St -ansiC .
243.Sh BUGS
244The current implementation does not support shift states.
245