man/man5/charmap.5

     te
 Copyright (c) 1992, X/Open Company Limited All Rights Reserved Portions Copyright (c) 2003, Sun Microsystems, Inc. All Rights Reserved
 Sun Microsystems, Inc. gratefully acknowledges The Open Group for permission to reproduce portions of its copyrighted documentation. Original documentation from The Open Group can be obtained online at
 http://www.opengroup.org/bookstore/.
 The Institute of Electrical and Electronics Engineers and The Open Group, have given us permission to reprint portions of their documentation. In the following statement, the phrase "this text" refers to portions of the system documentation. Portions of this text are reprinted and reproduced in electronic form in the Sun OS Reference Manual, from IEEE Std 1003.1, 2004 Edition, Standard for Information Technology -- Portable Operating System Interface (POSIX), The Open Group Base Specifications Issue 6, Copyright (C) 2001-2004 by the Institute of Electrical and Electronics Engineers, Inc and The Open Group. In the event of any discrepancy between these versions and the original IEEE and The Open Group Standard, the original IEEE and The Open Group Standard is the referee document. The original Standard can be obtained online at http://www.opengroup.org/unix/online.html.
 This notice shall appear on any product containing this material.
 The contents of this file are subject to the terms of the Common Development and Distribution License (the "License"). You may not use this file except in compliance with the License.
 You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE or http://www.opensolaris.org/os/licensing. See the License for the specific language governing permissions and limitations under the License.
 When distributing Covered Code, include this CDDL HEADER in each file and include the License file at usr/src/OPENSOLARIS.LICENSE. If applicable, add the following below this CDDL HEADER, with the fields enclosed by brackets "[]" replaced with your own identifying information: Portions Copyright [yyyy] [name of copyright owner]
 CHARMAP 5 "Dec 1, 2003"
 NAME
charmap - character set description file
 DESCRIPTION

A character set description file or charmap defines characteristics for a
coded character set. Other information about the coded character set may also
be in the file. Coded character set character values are defined using symbolic
character names followed by character encoding values.

The character set description file provides:


The capability to describe character set attributes (such as collation order or
character classes) independent of character set encoding, and using only the
characters in the portable character set. This makes it possible to create
generic localedef(1) source files for all codesets that share the
portable character set.


Standardized symbolic names for all characters in the portable character set,
making it possible to refer to any such character regardless of encoding.

 "Symbolic Names"

Each symbolic name is included in the file and is mapped to a unique encoding
value (except for those symbolic names that are shown with identical glyphs).
If the control characters commonly associated with the symbolic names in the
following table are supported by the implementation, the symbolic names and
their corresponding encoding values are included in the file. Some of the
encodings associated with the symbolic names in this table may be the same as
characters in the portable character set table.


<ACK> <DC2> <ENQ> <FS> <IS4> <SOH>
<BEL> <DC3> <EOT> <GS> <LF> <STX>
<BS> <DC4> <ESC> <HT> <NAK> <SUB>
<CAN> <DEL> <ETB> <IS1> <RS> <SYN>
<CR> <DLE> <ETX> <IS2> <SI> <US>
<DC1> <EM> <FF> <IS3> <SO> <VT>


 "Declarations"

The following declarations can precede the character definitions. Each must
consist of the symbol shown in the following list, starting in column 1,
including the surrounding brackets, followed by one or more blank characters,
followed by the value to be assigned to the symbol.
<code_set_name>

The name of the coded character set for which the character set description
file is defined.


<mb_cur_max>

The maximum number of bytes in a multi-byte character. This defaults to
1.


<mb_cur_min>

An unsigned positive integer value that defines the minimum number of bytes in
a character for the encoded character set.


<escape_char>

The escape character used to indicate that the characters following will be
interpreted in a special way, as defined later in this section. This defaults
to backslash ('\e'), which is the character glyph used in all the
following text and examples, unless otherwise noted.


<comment_char>

The character that when placed in column 1 of a charmap line, is used to
indicate that the line is to be ignored. The default character is the number
sign (#).


 "Format"

The character set mapping definitions will be all the lines immediately
following an identifier line containing the string CHARMAP starting in
column 1, and preceding a trailer line containing the string END
CHARMAP starting in column 1. Empty lines and lines containing a
<comment_char> in the first column will be ignored. Each non-comment line
of the character set mapping definition, that is, between the CHARMAP and
END CHARMAP lines of the file), must be in either of two forms:
"%s %s %s\en",<symbolic-name>,<encoding>,<comments>


or
"%s...%s %s %s\en",<symbolic-name>,<symbolic-name>, <encoding>,\e
 <comments>


In the first format, the line in the character set mapping definition defines a
single symbolic name and a corresponding encoding. A character following an
escape character is interpreted as itself; for example, the sequence
"<\e\e\e>>" represents the symbolic name "\e>" enclosed between
angle brackets.

In the second format, the line in the character set mapping definition defines
a range of one or more symbolic names. In this form, the symbolic names must
consist of zero or more non-numeric characters, followed by an integer formed
by one or more decimal digits. The characters preceding the integer must be
identical in the two symbolic names, and the integer formed by the digits in
the second symbolic name must be equal to or greater than the integer formed by
the digits in the first name. This is interpreted as a series of symbolic names
formed from the common part and each of the integers between the first and the
second integer, inclusive. As an example, <j0101>...<j0104> is
interpreted as the symbolic names <j0101>, <j0102>, <j0103>,
and <j0104>, in that order.

A character set mapping definition line must exist for all symbolic names and
must define the coded character value that corresponds to the character glyph
indicated in the table, or the coded character value that corresponds with the
control character symbolic name. If the control characters commonly associated
with the symbolic names are supported by the implementation, the symbolic name
and the corresponding encoding value must be included in the file. Additional
unique symbolic names may be included. A coded character value can be
represented by more than one symbolic name.

The encoding part is expressed as one (for single-byte character values) or
more concatenated decimal, octal or hexadecimal constants in the following
formats:
"%cd%d",<escape_char>,<decimal byte value>

"%cx%x",<escape_char>,<hexadecimal byte value>

"%c%o",<escape_char>,<octal byte value>


 "Decimal Constants"

Decimal constants must be represented by two or three decimal digits, preceded
by the escape character and the lower-case letter d; for example,
\ed05, \ed97, or \ed143. Hexadecimal constants must be
represented by two hexadecimal digits, preceded by the escape character and the
lower-case letter x; for example, \ex05, \ex61, or
\ex8f. Octal constants must be represented by two or three octal
digits, preceded by the escape character; for example, \e05, \e141,
or \e217. In a portable charmap file, each constant must represent an
8-bit byte. Implementations supporting other byte sizes may allow constants to
represent values larger than those that can be represented in 8-bit bytes, and
to allow additional digits in constants. When constants are concatenated for
multi-byte character values, they must be of the same type, and interpreted in
byte order from first to last with the least significant byte of the multi-byte
character specified by the last constant.
 "Ranges of Symbolic Names"

In lines defining ranges of symbolic names, the encoded value is the value for
the first symbolic name in the range (the symbolic name preceding the
ellipsis). Subsequent symbolic names defined by the range will have encoding
values in increasing order. Bytes are treated as unsigned octets and carry is
propagated between the bytes as necessary to represent the range. However,
because this causes a null byte in the second or subsequent bytes of a
character, such a declaration should not be specified. For example, the line
<j0101>...<j0104> \ed129\ed254


is interpreted as:
<j0101> \ed129\ed254
<j0102> \ed129\ed255
<j0103> \ed130\ed00
<j0104> \ed130\ed01


The expanded declaration of the symbol <j0103> in the above example is an
invalid specification, because it contains a null byte in the second byte of a
character.

The comment is optional.
 "Width Specification"

The following declarations can follow the character set mapping definitions
(after the "END CHARMAP" statement). Each consists of the keyword shown
in the following list, starting in column 1, followed by the value(s) to be
associated to the keyword, as defined below.
WIDTH

A non-negative integer value defining the column width for the printable
character in the coded character set mapping definitions. Coded character set
character values are defined using symbolic character names followed by column
width values. Defining a character with more than one WIDTH produces
undefined results. The END WIDTH keyword is used to terminate the
WIDTH definitions. Specifying the width of a non-printable character in a
WIDTH declaration produces undefined results.


WIDTH_DEFAULT

A non-negative integer value defining the default column width for any
printable character not listed by one of the WIDTH keywords. If no
WIDTH_DEFAULT keyword is included in the charmap, the default character
width is 1.


Example:

After the "END CHARMAP" statement, a syntax for a width definition would
be:
WIDTH
<A> 1
<B> 1
<C>...<Z> 1
...
<fool>...<foon> 2
...
END WIDTH


In this example, the numerical code point values represented by the symbols
<A> and <B> are assigned a width of 1. The code point values
< C> to <Z> inclusive, that is, <C>, <D>, <E>,
and so on, are also assigned a width of 1. Using <A>. . .<Z> would
have required fewer lines, but the alternative was shown to demonstrate
flexibility. The keyword WIDTH_DEFAULT could have been added as
appropriate.
 SEE ALSO

locale(1), localedef(1), nl_langinfo(3C),
extensions(5), locale(5)