xref: /freebsd/lib/libc/locale/multibyte.3 (revision 17ee9d00bc1ae1e598c38f25826f861e4bc6c3ce)
1.\" Copyright (c) 1993
2.\"	The Regents of the University of California.  All rights reserved.
3.\"
4.\" This code is derived from software contributed to Berkeley by
5.\" Donn Seeley of BSDI.
6.\"
7.\" Redistribution and use in source and binary forms, with or without
8.\" modification, are permitted provided that the following conditions
9.\" are met:
10.\" 1. Redistributions of source code must retain the above copyright
11.\"    notice, this list of conditions and the following disclaimer.
12.\" 2. Redistributions in binary form must reproduce the above copyright
13.\"    notice, this list of conditions and the following disclaimer in the
14.\"    documentation and/or other materials provided with the distribution.
15.\" 3. All advertising materials mentioning features or use of this software
16.\"    must display the following acknowledgement:
17.\"	This product includes software developed by the University of
18.\"	California, Berkeley and its contributors.
19.\" 4. Neither the name of the University nor the names of its contributors
20.\"    may be used to endorse or promote products derived from this software
21.\"    without specific prior written permission.
22.\"
23.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
24.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
25.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
26.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
27.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
28.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
29.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
30.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
31.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
32.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
33.\" SUCH DAMAGE.
34.\"
35.\"	@(#)multibyte.3	8.1 (Berkeley) 6/4/93
36.\"
37.Dd "June 4, 1993"
38.Dt MULTIBYTE 3
39.Os
40.Sh NAME
41.Nm mblen ,
42.Nm mbstowcs ,
43.Nm mbtowc ,
44.Nm wcstombs ,
45.Nm wctomb
46.Nd multibyte character support for C
47.Sh SYNOPSIS
48.Fd #include <stdlib.h>
49.Ft int
50.Fn mblen "const char *mbchar" "int nbytes"
51.Ft size_t
52.Fn mbstowcs "wchar_t *wcstring" "const char *mbstring" "size_t nwchars"
53.Ft int
54.Fn mbtowc "wchar_t *wcharp" "const char *mbchar" "size_t nbytes"
55.Ft size_t
56.Fn wcstombs "char *mbstring" "const wchar_t *wcstring" "size_t nbytes"
57.Ft int
58.Fn wctomb "char *mbchar" "wchar_t wchar"
59.Sh DESCRIPTION
60The basic elements of some written natural languages such as Chinese
61cannot be represented uniquely with single C
62.Va char Ns s .
63The C standard supports two different ways of dealing with
64extended natural language encodings,
65.Em wide
66characters and
67.Em multibyte
68characters.
69Wide characters are an internal representation
70which allows each basic element to map
71to a single object of type
72.Va wchar_t .
73Multibyte characters are used for input and output
74and code each basic element as a sequence of C
75.Va char Ns s .
76Individual basic elements may map into one or more
77.Pq up to Dv MB_CHAR_MAX
78bytes in a multibyte character.
79.Pp
80The current locale
81.Pq Xr setlocale 3
82governs the interpretation of wide and multibyte characters.
83The locale category
84.Dv LC_CTYPE
85specifically controls this interpretation.
86The
87.Va wchar_t
88type is wide enough to hold the largest value
89in the wide character representations for all locales.
90.Pp
91Multibyte strings may contain
92.Sq shift
93indicators to switch to and from
94particular modes within the given representation.
95If explicit bytes are used to signal shifting,
96these are not recognized as separate characters
97but are lumped with a neighboring character.
98There is always a distinguished
99.Sq initial
100shift state.
101The
102.Fn mbstowcs
103and
104.Fn wcstombs
105functions assume that multibyte strings are interpreted
106starting from the initial shift state.
107The
108.Fn mblen ,
109.Fn mbtowc
110and
111.Fn wctomb
112functions maintain static shift state internally.
113A call with a null
114.Fa mbchar
115pointer returns nonzero if the current locale requires shift states,
116zero otherwise;
117if shift states are required, the shift state is reset to the initial state.
118The internal shift states are undefined after a call to
119.Fn setlocale
120with the
121.Dv LC_CTYPE
122or
123.Dv LC_ALL
124categories.
125.Pp
126For convenience in processing,
127the wide character with value 0
128.Pq the null wide character
129is recognized as the wide character string terminator,
130and the character with value 0
131.Pq the null byte
132is recognized as the multibyte character string terminator.
133Null bytes are not permitted within multibyte characters.
134.Pp
135The
136.Fn mblen
137function computes the length in bytes
138of a multibyte character
139.Fa mbchar .
140Up to
141.Fa nbytes
142bytes are examined.
143.Pp
144The
145.Fn mbtowc
146function converts a multibyte character
147.Fa mbchar
148into a wide character and stores the result
149in the object pointed to by
150.Fa wcharp.
151Up to
152.Fa nbytes
153bytes are examined.
154.Pp
155The
156.Fn wctomb
157function converts a wide character
158.Fa wchar
159into a multibyte character and stores
160the result in
161.Fa mbchar .
162The object pointed to by
163.Fa mbchar
164must be large enough to accommodate the multibyte character.
165.Pp
166The
167.Fn mbstowcs
168function converts a multibyte character string
169.Fa mbstring
170into a wide character string
171.Fa wcstring .
172No more than
173.Fa nwchars
174wide characters are stored.
175A terminating null wide character is appended if there is room.
176.Pp
177The
178.Fn wcstombs
179function converts a wide character string
180.Fa wcstring
181into a multibyte character string
182.Fa mbstring .
183Up to
184.Fa nbytes
185bytes are stored in
186.Fa mbstring .
187Partial multibyte characters at the end of the string are not stored.
188The multibyte character string is null terminated if there is room.
189.Sh "RETURN VALUES
190If multibyte characters are not supported in the current locale,
191all of these functions will return \-1 if characters can be processed,
192otherwise 0.
193.Pp
194If
195.Fa mbchar
196is
197.Dv NULL ,
198the
199.Fn mblen ,
200.Fn mbtowc
201and
202.Fn wctomb
203functions return nonzero if shift states are supported,
204zero otherwise.
205If
206.Fa mbchar
207is valid,
208then these functions return
209the number of bytes processed in
210.Fa mbchar ,
211or \-1 if no multibyte character
212could be recognized or converted.
213.Pp
214The
215.Fn mbstowcs
216function returns the number of wide characters converted,
217not counting any terminating null wide character.
218The
219.Fn wcstombs
220function returns the number of bytes converted,
221not counting any terminating null byte.
222If any invalid multibyte characters are encountered,
223both functions return \-1.
224.Sh "SEE ALSO
225.Xr euc 4 ,
226.Xr mbrune 3 ,
227.Xr rune 3 ,
228.Xr setlocale 3 ,
229.Xr utf2 4
230.Sh STANDARDS
231The
232.Fn mblen ,
233.Fn mbstowcs ,
234.Fn mbtowc ,
235.Fn wcstombs
236and
237.Fn wctomb
238functions conform to
239.St -ansiC .
240.Sh BUGS
241The current implementation does not support shift states.
242