xref: /freebsd/crypto/heimdal/lib/wind/rfc3490.txt (revision 6a068746777241722b2b32c5d0bc443a2a64d80b)
1*ae771770SStanislav Sedov
2*ae771770SStanislav Sedov
3*ae771770SStanislav Sedov
4*ae771770SStanislav Sedov
5*ae771770SStanislav Sedov
6*ae771770SStanislav Sedov
7*ae771770SStanislav SedovNetwork Working Group                                       P. Faltstrom
8*ae771770SStanislav SedovRequest for Comments: 3490                                         Cisco
9*ae771770SStanislav SedovCategory: Standards Track                                     P. Hoffman
10*ae771770SStanislav Sedov                                                              IMC & VPNC
11*ae771770SStanislav Sedov                                                             A. Costello
12*ae771770SStanislav Sedov                                                             UC Berkeley
13*ae771770SStanislav Sedov                                                              March 2003
14*ae771770SStanislav Sedov
15*ae771770SStanislav Sedov
16*ae771770SStanislav Sedov         Internationalizing Domain Names in Applications (IDNA)
17*ae771770SStanislav Sedov
18*ae771770SStanislav SedovStatus of this Memo
19*ae771770SStanislav Sedov
20*ae771770SStanislav Sedov   This document specifies an Internet standards track protocol for the
21*ae771770SStanislav Sedov   Internet community, and requests discussion and suggestions for
22*ae771770SStanislav Sedov   improvements.  Please refer to the current edition of the "Internet
23*ae771770SStanislav Sedov   Official Protocol Standards" (STD 1) for the standardization state
24*ae771770SStanislav Sedov   and status of this protocol.  Distribution of this memo is unlimited.
25*ae771770SStanislav Sedov
26*ae771770SStanislav SedovCopyright Notice
27*ae771770SStanislav Sedov
28*ae771770SStanislav Sedov   Copyright (C) The Internet Society (2003).  All Rights Reserved.
29*ae771770SStanislav Sedov
30*ae771770SStanislav SedovAbstract
31*ae771770SStanislav Sedov
32*ae771770SStanislav Sedov   Until now, there has been no standard method for domain names to use
33*ae771770SStanislav Sedov   characters outside the ASCII repertoire.  This document defines
34*ae771770SStanislav Sedov   internationalized domain names (IDNs) and a mechanism called
35*ae771770SStanislav Sedov   Internationalizing Domain Names in Applications (IDNA) for handling
36*ae771770SStanislav Sedov   them in a standard fashion.  IDNs use characters drawn from a large
37*ae771770SStanislav Sedov   repertoire (Unicode), but IDNA allows the non-ASCII characters to be
38*ae771770SStanislav Sedov   represented using only the ASCII characters already allowed in so-
39*ae771770SStanislav Sedov   called host names today.  This backward-compatible representation is
40*ae771770SStanislav Sedov   required in existing protocols like DNS, so that IDNs can be
41*ae771770SStanislav Sedov   introduced with no changes to the existing infrastructure.  IDNA is
42*ae771770SStanislav Sedov   only meant for processing domain names, not free text.
43*ae771770SStanislav Sedov
44*ae771770SStanislav SedovTable of Contents
45*ae771770SStanislav Sedov
46*ae771770SStanislav Sedov   1. Introduction..................................................  2
47*ae771770SStanislav Sedov      1.1 Problem Statement.........................................  3
48*ae771770SStanislav Sedov      1.2 Limitations of IDNA.......................................  3
49*ae771770SStanislav Sedov      1.3 Brief overview for application developers.................  4
50*ae771770SStanislav Sedov   2. Terminology...................................................  5
51*ae771770SStanislav Sedov   3. Requirements and applicability................................  7
52*ae771770SStanislav Sedov      3.1 Requirements..............................................  7
53*ae771770SStanislav Sedov      3.2 Applicability.............................................  8
54*ae771770SStanislav Sedov         3.2.1. DNS resource records................................  8
55*ae771770SStanislav Sedov
56*ae771770SStanislav Sedov
57*ae771770SStanislav Sedov
58*ae771770SStanislav SedovFaltstrom, et al.           Standards Track                     [Page 1]
59*ae771770SStanislav Sedov
60*ae771770SStanislav SedovRFC 3490                          IDNA                        March 2003
61*ae771770SStanislav Sedov
62*ae771770SStanislav Sedov
63*ae771770SStanislav Sedov         3.2.2. Non-domain-name data types stored in domain names...  9
64*ae771770SStanislav Sedov   4. Conversion operations.........................................  9
65*ae771770SStanislav Sedov      4.1 ToASCII................................................... 10
66*ae771770SStanislav Sedov      4.2 ToUnicode................................................. 11
67*ae771770SStanislav Sedov   5. ACE prefix.................................................... 12
68*ae771770SStanislav Sedov   6. Implications for typical applications using DNS............... 13
69*ae771770SStanislav Sedov      6.1 Entry and display in applications......................... 14
70*ae771770SStanislav Sedov      6.2 Applications and resolver libraries....................... 15
71*ae771770SStanislav Sedov      6.3 DNS servers............................................... 15
72*ae771770SStanislav Sedov      6.4 Avoiding exposing users to the raw ACE encoding........... 16
73*ae771770SStanislav Sedov      6.5  DNSSEC authentication of IDN domain names................ 16
74*ae771770SStanislav Sedov   7. Name server considerations.................................... 17
75*ae771770SStanislav Sedov   8. Root server considerations.................................... 17
76*ae771770SStanislav Sedov   9. References.................................................... 18
77*ae771770SStanislav Sedov      9.1 Normative References...................................... 18
78*ae771770SStanislav Sedov      9.2 Informative References.................................... 18
79*ae771770SStanislav Sedov   10. Security Considerations...................................... 19
80*ae771770SStanislav Sedov   11. IANA Considerations.......................................... 20
81*ae771770SStanislav Sedov   12. Authors' Addresses........................................... 21
82*ae771770SStanislav Sedov   13. Full Copyright Statement..................................... 22
83*ae771770SStanislav Sedov
84*ae771770SStanislav Sedov1. Introduction
85*ae771770SStanislav Sedov
86*ae771770SStanislav Sedov   IDNA works by allowing applications to use certain ASCII name labels
87*ae771770SStanislav Sedov   (beginning with a special prefix) to represent non-ASCII name labels.
88*ae771770SStanislav Sedov   Lower-layer protocols need not be aware of this; therefore IDNA does
89*ae771770SStanislav Sedov   not depend on changes to any infrastructure.  In particular, IDNA
90*ae771770SStanislav Sedov   does not depend on any changes to DNS servers, resolvers, or protocol
91*ae771770SStanislav Sedov   elements, because the ASCII name service provided by the existing DNS
92*ae771770SStanislav Sedov   is entirely sufficient for IDNA.
93*ae771770SStanislav Sedov
94*ae771770SStanislav Sedov   This document does not require any applications to conform to IDNA,
95*ae771770SStanislav Sedov   but applications can elect to use IDNA in order to support IDN while
96*ae771770SStanislav Sedov   maintaining interoperability with existing infrastructure.  If an
97*ae771770SStanislav Sedov   application wants to use non-ASCII characters in domain names, IDNA
98*ae771770SStanislav Sedov   is the only currently-defined option.  Adding IDNA support to an
99*ae771770SStanislav Sedov   existing application entails changes to the application only, and
100*ae771770SStanislav Sedov   leaves room for flexibility in the user interface.
101*ae771770SStanislav Sedov
102*ae771770SStanislav Sedov   A great deal of the discussion of IDN solutions has focused on
103*ae771770SStanislav Sedov   transition issues and how IDN will work in a world where not all of
104*ae771770SStanislav Sedov   the components have been updated.  Proposals that were not chosen by
105*ae771770SStanislav Sedov   the IDN Working Group would depend on user applications, resolvers,
106*ae771770SStanislav Sedov   and DNS servers being updated in order for a user to use an
107*ae771770SStanislav Sedov   internationalized domain name.  Rather than rely on widespread
108*ae771770SStanislav Sedov   updating of all components, IDNA depends on updates to user
109*ae771770SStanislav Sedov   applications only; no changes are needed to the DNS protocol or any
110*ae771770SStanislav Sedov   DNS servers or the resolvers on user's computers.
111*ae771770SStanislav Sedov
112*ae771770SStanislav Sedov
113*ae771770SStanislav Sedov
114*ae771770SStanislav SedovFaltstrom, et al.           Standards Track                     [Page 2]
115*ae771770SStanislav Sedov
116*ae771770SStanislav SedovRFC 3490                          IDNA                        March 2003
117*ae771770SStanislav Sedov
118*ae771770SStanislav Sedov
119*ae771770SStanislav Sedov1.1 Problem Statement
120*ae771770SStanislav Sedov
121*ae771770SStanislav Sedov   The IDNA specification solves the problem of extending the repertoire
122*ae771770SStanislav Sedov   of characters that can be used in domain names to include the Unicode
123*ae771770SStanislav Sedov   repertoire (with some restrictions).
124*ae771770SStanislav Sedov
125*ae771770SStanislav Sedov   IDNA does not extend the service offered by DNS to the applications.
126*ae771770SStanislav Sedov   Instead, the applications (and, by implication, the users) continue
127*ae771770SStanislav Sedov   to see an exact-match lookup service.  Either there is a single
128*ae771770SStanislav Sedov   exactly-matching name or there is no match.  This model has served
129*ae771770SStanislav Sedov   the existing applications well, but it requires, with or without
130*ae771770SStanislav Sedov   internationalized domain names, that users know the exact spelling of
131*ae771770SStanislav Sedov   the domain names that the users type into applications such as web
132*ae771770SStanislav Sedov   browsers and mail user agents.  The introduction of the larger
133*ae771770SStanislav Sedov   repertoire of characters potentially makes the set of misspellings
134*ae771770SStanislav Sedov   larger, especially given that in some cases the same appearance, for
135*ae771770SStanislav Sedov   example on a business card, might visually match several Unicode code
136*ae771770SStanislav Sedov   points or several sequences of code points.
137*ae771770SStanislav Sedov
138*ae771770SStanislav Sedov   IDNA allows the graceful introduction of IDNs not only by avoiding
139*ae771770SStanislav Sedov   upgrades to existing infrastructure (such as DNS servers and mail
140*ae771770SStanislav Sedov   transport agents), but also by allowing some rudimentary use of IDNs
141*ae771770SStanislav Sedov   in applications by using the ASCII representation of the non-ASCII
142*ae771770SStanislav Sedov   name labels.  While such names are very user-unfriendly to read and
143*ae771770SStanislav Sedov   type, and hence are not suitable for user input, they allow (for
144*ae771770SStanislav Sedov   instance) replying to email and clicking on URLs even though the
145*ae771770SStanislav Sedov   domain name displayed is incomprehensible to the user.  In order to
146*ae771770SStanislav Sedov   allow user-friendly input and output of the IDNs, the applications
147*ae771770SStanislav Sedov   need to be modified to conform to this specification.
148*ae771770SStanislav Sedov
149*ae771770SStanislav Sedov   IDNA uses the Unicode character repertoire, which avoids the
150*ae771770SStanislav Sedov   significant delays that would be inherent in waiting for a different
151*ae771770SStanislav Sedov   and specific character set be defined for IDN purposes by some other
152*ae771770SStanislav Sedov   standards developing organization.
153*ae771770SStanislav Sedov
154*ae771770SStanislav Sedov1.2 Limitations of IDNA
155*ae771770SStanislav Sedov
156*ae771770SStanislav Sedov   The IDNA protocol does not solve all linguistic issues with users
157*ae771770SStanislav Sedov   inputting names in different scripts.  Many important language-based
158*ae771770SStanislav Sedov   and script-based mappings are not covered in IDNA and need to be
159*ae771770SStanislav Sedov   handled outside the protocol.  For example, names that are entered in
160*ae771770SStanislav Sedov   a mix of traditional and simplified Chinese characters will not be
161*ae771770SStanislav Sedov   mapped to a single canonical name.  Another example is Scandinavian
162*ae771770SStanislav Sedov   names that are entered with U+00F6 (LATIN SMALL LETTER O WITH
163*ae771770SStanislav Sedov   DIAERESIS) will not be mapped to U+00F8 (LATIN SMALL LETTER O WITH
164*ae771770SStanislav Sedov   STROKE).
165*ae771770SStanislav Sedov
166*ae771770SStanislav Sedov
167*ae771770SStanislav Sedov
168*ae771770SStanislav Sedov
169*ae771770SStanislav Sedov
170*ae771770SStanislav SedovFaltstrom, et al.           Standards Track                     [Page 3]
171*ae771770SStanislav Sedov
172*ae771770SStanislav SedovRFC 3490                          IDNA                        March 2003
173*ae771770SStanislav Sedov
174*ae771770SStanislav Sedov
175*ae771770SStanislav Sedov   An example of an important issue that is not considered in detail in
176*ae771770SStanislav Sedov   IDNA is how to provide a high probability that a user who is entering
177*ae771770SStanislav Sedov   a domain name based on visual information (such as from a business
178*ae771770SStanislav Sedov   card or billboard) or aural information (such as from a telephone or
179*ae771770SStanislav Sedov   radio) would correctly enter the IDN.  Similar issues exist for ASCII
180*ae771770SStanislav Sedov   domain names, for example the possible visual confusion between the
181*ae771770SStanislav Sedov   letter 'O' and the digit zero, but the introduction of the larger
182*ae771770SStanislav Sedov   repertoire of characters creates more opportunities of similar
183*ae771770SStanislav Sedov   looking and similar sounding names.  Note that this is a complex
184*ae771770SStanislav Sedov   issue relating to languages, input methods on computers, and so on.
185*ae771770SStanislav Sedov   Furthermore, the kind of matching and searching necessary for a high
186*ae771770SStanislav Sedov   probability of success would not fit the role of the DNS and its
187*ae771770SStanislav Sedov   exact matching function.
188*ae771770SStanislav Sedov
189*ae771770SStanislav Sedov1.3 Brief overview for application developers
190*ae771770SStanislav Sedov
191*ae771770SStanislav Sedov   Applications can use IDNA to support internationalized domain names
192*ae771770SStanislav Sedov   anywhere that ASCII domain names are already supported, including DNS
193*ae771770SStanislav Sedov   master files and resolver interfaces.  (Applications can also define
194*ae771770SStanislav Sedov   protocols and interfaces that support IDNs directly using non-ASCII
195*ae771770SStanislav Sedov   representations.  IDNA does not prescribe any particular
196*ae771770SStanislav Sedov   representation for new protocols, but it still defines which names
197*ae771770SStanislav Sedov   are valid and how they are compared.)
198*ae771770SStanislav Sedov
199*ae771770SStanislav Sedov   The IDNA protocol is contained completely within applications.  It is
200*ae771770SStanislav Sedov   not a client-server or peer-to-peer protocol: everything is done
201*ae771770SStanislav Sedov   inside the application itself.  When used with a DNS resolver
202*ae771770SStanislav Sedov   library, IDNA is inserted as a "shim" between the application and the
203*ae771770SStanislav Sedov   resolver library.  When used for writing names into a DNS zone, IDNA
204*ae771770SStanislav Sedov   is used just before the name is committed to the zone.
205*ae771770SStanislav Sedov
206*ae771770SStanislav Sedov   There are two operations described in section 4 of this document:
207*ae771770SStanislav Sedov
208*ae771770SStanislav Sedov   -  The ToASCII operation is used before sending an IDN to something
209*ae771770SStanislav Sedov      that expects ASCII names (such as a resolver) or writing an IDN
210*ae771770SStanislav Sedov      into a place that expects ASCII names (such as a DNS master file).
211*ae771770SStanislav Sedov
212*ae771770SStanislav Sedov   -  The ToUnicode operation is used when displaying names to users,
213*ae771770SStanislav Sedov      for example names obtained from a DNS zone.
214*ae771770SStanislav Sedov
215*ae771770SStanislav Sedov   It is important to note that the ToASCII operation can fail.  If it
216*ae771770SStanislav Sedov   fails when processing a domain name, that domain name cannot be used
217*ae771770SStanislav Sedov   as an internationalized domain name and the application has to have
218*ae771770SStanislav Sedov   some method of dealing with this failure.
219*ae771770SStanislav Sedov
220*ae771770SStanislav Sedov   IDNA requires that implementations process input strings with
221*ae771770SStanislav Sedov   Nameprep [NAMEPREP], which is a profile of Stringprep [STRINGPREP],
222*ae771770SStanislav Sedov   and then with Punycode [PUNYCODE].  Implementations of IDNA MUST
223*ae771770SStanislav Sedov
224*ae771770SStanislav Sedov
225*ae771770SStanislav Sedov
226*ae771770SStanislav SedovFaltstrom, et al.           Standards Track                     [Page 4]
227*ae771770SStanislav Sedov
228*ae771770SStanislav SedovRFC 3490                          IDNA                        March 2003
229*ae771770SStanislav Sedov
230*ae771770SStanislav Sedov
231*ae771770SStanislav Sedov   fully implement Nameprep and Punycode; neither Nameprep nor Punycode
232*ae771770SStanislav Sedov   are optional.
233*ae771770SStanislav Sedov
234*ae771770SStanislav Sedov2. Terminology
235*ae771770SStanislav Sedov
236*ae771770SStanislav Sedov   The key words "MUST", "SHALL", "REQUIRED", "SHOULD", "RECOMMENDED",
237*ae771770SStanislav Sedov   and "MAY" in this document are to be interpreted as described in BCP
238*ae771770SStanislav Sedov   14, RFC 2119 [RFC2119].
239*ae771770SStanislav Sedov
240*ae771770SStanislav Sedov   A code point is an integer value associated with a character in a
241*ae771770SStanislav Sedov   coded character set.
242*ae771770SStanislav Sedov
243*ae771770SStanislav Sedov   Unicode [UNICODE] is a coded character set containing tens of
244*ae771770SStanislav Sedov   thousands of characters.  A single Unicode code point is denoted by
245*ae771770SStanislav Sedov   "U+" followed by four to six hexadecimal digits, while a range of
246*ae771770SStanislav Sedov   Unicode code points is denoted by two hexadecimal numbers separated
247*ae771770SStanislav Sedov   by "..", with no prefixes.
248*ae771770SStanislav Sedov
249*ae771770SStanislav Sedov   ASCII means US-ASCII [USASCII], a coded character set containing 128
250*ae771770SStanislav Sedov   characters associated with code points in the range 0..7F.  Unicode
251*ae771770SStanislav Sedov   is an extension of ASCII: it includes all the ASCII characters and
252*ae771770SStanislav Sedov   associates them with the same code points.
253*ae771770SStanislav Sedov
254*ae771770SStanislav Sedov   The term "LDH code points" is defined in this document to mean the
255*ae771770SStanislav Sedov   code points associated with ASCII letters, digits, and the hyphen-
256*ae771770SStanislav Sedov   minus; that is, U+002D, 30..39, 41..5A, and 61..7A. "LDH" is an
257*ae771770SStanislav Sedov   abbreviation for "letters, digits, hyphen".
258*ae771770SStanislav Sedov
259*ae771770SStanislav Sedov   [STD13] talks about "domain names" and "host names", but many people
260*ae771770SStanislav Sedov   use the terms interchangeably.  Further, because [STD13] was not
261*ae771770SStanislav Sedov   terribly clear, many people who are sure they know the exact
262*ae771770SStanislav Sedov   definitions of each of these terms disagree on the definitions.  In
263*ae771770SStanislav Sedov   this document the term "domain name" is used in general.  This
264*ae771770SStanislav Sedov   document explicitly cites [STD3] whenever referring to the host name
265*ae771770SStanislav Sedov   syntax restrictions defined therein.
266*ae771770SStanislav Sedov
267*ae771770SStanislav Sedov   A label is an individual part of a domain name.  Labels are usually
268*ae771770SStanislav Sedov   shown separated by dots; for example, the domain name
269*ae771770SStanislav Sedov   "www.example.com" is composed of three labels: "www", "example", and
270*ae771770SStanislav Sedov   "com".  (The zero-length root label described in [STD13], which can
271*ae771770SStanislav Sedov   be explicit as in "www.example.com." or implicit as in
272*ae771770SStanislav Sedov   "www.example.com", is not considered a label in this specification.)
273*ae771770SStanislav Sedov   IDNA extends the set of usable characters in labels that are text.
274*ae771770SStanislav Sedov   For the rest of this document, the term "label" is shorthand for
275*ae771770SStanislav Sedov   "text label", and "every label" means "every text label".
276*ae771770SStanislav Sedov
277*ae771770SStanislav Sedov
278*ae771770SStanislav Sedov
279*ae771770SStanislav Sedov
280*ae771770SStanislav Sedov
281*ae771770SStanislav Sedov
282*ae771770SStanislav SedovFaltstrom, et al.           Standards Track                     [Page 5]
283*ae771770SStanislav Sedov
284*ae771770SStanislav SedovRFC 3490                          IDNA                        March 2003
285*ae771770SStanislav Sedov
286*ae771770SStanislav Sedov
287*ae771770SStanislav Sedov   An "internationalized label" is a label to which the ToASCII
288*ae771770SStanislav Sedov   operation (see section 4) can be applied without failing (with the
289*ae771770SStanislav Sedov   UseSTD3ASCIIRules flag unset).  This implies that every ASCII label
290*ae771770SStanislav Sedov   that satisfies the [STD13] length restriction is an internationalized
291*ae771770SStanislav Sedov   label.  Therefore the term "internationalized label" is a
292*ae771770SStanislav Sedov   generalization, embracing both old ASCII labels and new non-ASCII
293*ae771770SStanislav Sedov   labels.  Although most Unicode characters can appear in
294*ae771770SStanislav Sedov   internationalized labels, ToASCII will fail for some input strings,
295*ae771770SStanislav Sedov   and such strings are not valid internationalized labels.
296*ae771770SStanislav Sedov
297*ae771770SStanislav Sedov   An "internationalized domain name" (IDN) is a domain name in which
298*ae771770SStanislav Sedov   every label is an internationalized label.  This implies that every
299*ae771770SStanislav Sedov   ASCII domain name is an IDN (which implies that it is possible for a
300*ae771770SStanislav Sedov   name to be an IDN without it containing any non-ASCII characters).
301*ae771770SStanislav Sedov   This document does not attempt to define an "internationalized host
302*ae771770SStanislav Sedov   name".  Just as has been the case with ASCII names, some DNS zone
303*ae771770SStanislav Sedov   administrators may impose restrictions, beyond those imposed by DNS
304*ae771770SStanislav Sedov   or IDNA, on the characters or strings that may be registered as
305*ae771770SStanislav Sedov   labels in their zones.  Such restrictions have no impact on the
306*ae771770SStanislav Sedov   syntax or semantics of DNS protocol messages; a query for a name that
307*ae771770SStanislav Sedov   matches no records will yield the same response regardless of the
308*ae771770SStanislav Sedov   reason why it is not in the zone.  Clients issuing queries or
309*ae771770SStanislav Sedov   interpreting responses cannot be assumed to have any knowledge of
310*ae771770SStanislav Sedov   zone-specific restrictions or conventions.
311*ae771770SStanislav Sedov
312*ae771770SStanislav Sedov   In IDNA, equivalence of labels is defined in terms of the ToASCII
313*ae771770SStanislav Sedov   operation, which constructs an ASCII form for a given label, whether
314*ae771770SStanislav Sedov   or not the label was already an ASCII label.  Labels are defined to
315*ae771770SStanislav Sedov   be equivalent if and only if their ASCII forms produced by ToASCII
316*ae771770SStanislav Sedov   match using a case-insensitive ASCII comparison.  ASCII labels
317*ae771770SStanislav Sedov   already have a notion of equivalence: upper case and lower case are
318*ae771770SStanislav Sedov   considered equivalent.  The IDNA notion of equivalence is an
319*ae771770SStanislav Sedov   extension of that older notion.  Equivalent labels in IDNA are
320*ae771770SStanislav Sedov   treated as alternate forms of the same label, just as "foo" and "Foo"
321*ae771770SStanislav Sedov   are treated as alternate forms of the same label.
322*ae771770SStanislav Sedov
323*ae771770SStanislav Sedov   To allow internationalized labels to be handled by existing
324*ae771770SStanislav Sedov   applications, IDNA uses an "ACE label" (ACE stands for ASCII
325*ae771770SStanislav Sedov   Compatible Encoding).  An ACE label is an internationalized label
326*ae771770SStanislav Sedov   that can be rendered in ASCII and is equivalent to an
327*ae771770SStanislav Sedov   internationalized label that cannot be rendered in ASCII.  Given any
328*ae771770SStanislav Sedov   internationalized label that cannot be rendered in ASCII, the ToASCII
329*ae771770SStanislav Sedov   operation will convert it to an equivalent ACE label (whereas an
330*ae771770SStanislav Sedov   ASCII label will be left unaltered by ToASCII).  ACE labels are
331*ae771770SStanislav Sedov   unsuitable for display to users.  The ToUnicode operation will
332*ae771770SStanislav Sedov   convert any label to an equivalent non-ACE label.  In fact, an ACE
333*ae771770SStanislav Sedov   label is formally defined to be any label that the ToUnicode
334*ae771770SStanislav Sedov   operation would alter (whereas non-ACE labels are left unaltered by
335*ae771770SStanislav Sedov
336*ae771770SStanislav Sedov
337*ae771770SStanislav Sedov
338*ae771770SStanislav SedovFaltstrom, et al.           Standards Track                     [Page 6]
339*ae771770SStanislav Sedov
340*ae771770SStanislav SedovRFC 3490                          IDNA                        March 2003
341*ae771770SStanislav Sedov
342*ae771770SStanislav Sedov
343*ae771770SStanislav Sedov   ToUnicode).  Every ACE label begins with the ACE prefix specified in
344*ae771770SStanislav Sedov   section 5.  The ToASCII and ToUnicode operations are specified in
345*ae771770SStanislav Sedov   section 4.
346*ae771770SStanislav Sedov
347*ae771770SStanislav Sedov   The "ACE prefix" is defined in this document to be a string of ASCII
348*ae771770SStanislav Sedov   characters that appears at the beginning of every ACE label.  It is
349*ae771770SStanislav Sedov   specified in section 5.
350*ae771770SStanislav Sedov
351*ae771770SStanislav Sedov   A "domain name slot" is defined in this document to be a protocol
352*ae771770SStanislav Sedov   element or a function argument or a return value (and so on)
353*ae771770SStanislav Sedov   explicitly designated for carrying a domain name.  Examples of domain
354*ae771770SStanislav Sedov   name slots include: the QNAME field of a DNS query; the name argument
355*ae771770SStanislav Sedov   of the gethostbyname() library function; the part of an email address
356*ae771770SStanislav Sedov   following the at-sign (@) in the From: field of an email message
357*ae771770SStanislav Sedov   header; and the host portion of the URI in the src attribute of an
358*ae771770SStanislav Sedov   HTML <IMG> tag.  General text that just happens to contain a domain
359*ae771770SStanislav Sedov   name is not a domain name slot; for example, a domain name appearing
360*ae771770SStanislav Sedov   in the plain text body of an email message is not occupying a domain
361*ae771770SStanislav Sedov   name slot.
362*ae771770SStanislav Sedov
363*ae771770SStanislav Sedov   An "IDN-aware domain name slot" is defined in this document to be a
364*ae771770SStanislav Sedov   domain name slot explicitly designated for carrying an
365*ae771770SStanislav Sedov   internationalized domain name as defined in this document.  The
366*ae771770SStanislav Sedov   designation may be static (for example, in the specification of the
367*ae771770SStanislav Sedov   protocol or interface) or dynamic (for example, as a result of
368*ae771770SStanislav Sedov   negotiation in an interactive session).
369*ae771770SStanislav Sedov
370*ae771770SStanislav Sedov   An "IDN-unaware domain name slot" is defined in this document to be
371*ae771770SStanislav Sedov   any domain name slot that is not an IDN-aware domain name slot.
372*ae771770SStanislav Sedov   Obviously, this includes any domain name slot whose specification
373*ae771770SStanislav Sedov   predates IDNA.
374*ae771770SStanislav Sedov
375*ae771770SStanislav Sedov3. Requirements and applicability
376*ae771770SStanislav Sedov
377*ae771770SStanislav Sedov3.1 Requirements
378*ae771770SStanislav Sedov
379*ae771770SStanislav Sedov   IDNA conformance means adherence to the following four requirements:
380*ae771770SStanislav Sedov
381*ae771770SStanislav Sedov   1) Whenever dots are used as label separators, the following
382*ae771770SStanislav Sedov      characters MUST be recognized as dots: U+002E (full stop), U+3002
383*ae771770SStanislav Sedov      (ideographic full stop), U+FF0E (fullwidth full stop), U+FF61
384*ae771770SStanislav Sedov      (halfwidth ideographic full stop).
385*ae771770SStanislav Sedov
386*ae771770SStanislav Sedov   2) Whenever a domain name is put into an IDN-unaware domain name slot
387*ae771770SStanislav Sedov      (see section 2), it MUST contain only ASCII characters.  Given an
388*ae771770SStanislav Sedov      internationalized domain name (IDN), an equivalent domain name
389*ae771770SStanislav Sedov      satisfying this requirement can be obtained by applying the
390*ae771770SStanislav Sedov
391*ae771770SStanislav Sedov
392*ae771770SStanislav Sedov
393*ae771770SStanislav Sedov
394*ae771770SStanislav SedovFaltstrom, et al.           Standards Track                     [Page 7]
395*ae771770SStanislav Sedov
396*ae771770SStanislav SedovRFC 3490                          IDNA                        March 2003
397*ae771770SStanislav Sedov
398*ae771770SStanislav Sedov
399*ae771770SStanislav Sedov      ToASCII operation (see section 4) to each label and, if dots are
400*ae771770SStanislav Sedov      used as label separators, changing all the label separators to
401*ae771770SStanislav Sedov      U+002E.
402*ae771770SStanislav Sedov
403*ae771770SStanislav Sedov   3) ACE labels obtained from domain name slots SHOULD be hidden from
404*ae771770SStanislav Sedov      users when it is known that the environment can handle the non-ACE
405*ae771770SStanislav Sedov      form, except when the ACE form is explicitly requested.  When it
406*ae771770SStanislav Sedov      is not known whether or not the environment can handle the non-ACE
407*ae771770SStanislav Sedov      form, the application MAY use the non-ACE form (which might fail,
408*ae771770SStanislav Sedov      such as by not being displayed properly), or it MAY use the ACE
409*ae771770SStanislav Sedov      form (which will look unintelligle to the user).  Given an
410*ae771770SStanislav Sedov      internationalized domain name, an equivalent domain name
411*ae771770SStanislav Sedov      containing no ACE labels can be obtained by applying the ToUnicode
412*ae771770SStanislav Sedov      operation (see section 4) to each label.  When requirements 2 and
413*ae771770SStanislav Sedov      3 both apply, requirement 2 takes precedence.
414*ae771770SStanislav Sedov
415*ae771770SStanislav Sedov   4) Whenever two labels are compared, they MUST be considered to match
416*ae771770SStanislav Sedov      if and only if they are equivalent, that is, their ASCII forms
417*ae771770SStanislav Sedov      (obtained by applying ToASCII) match using a case-insensitive
418*ae771770SStanislav Sedov      ASCII comparison.  Whenever two names are compared, they MUST be
419*ae771770SStanislav Sedov      considered to match if and only if their corresponding labels
420*ae771770SStanislav Sedov      match, regardless of whether the names use the same forms of label
421*ae771770SStanislav Sedov      separators.
422*ae771770SStanislav Sedov
423*ae771770SStanislav Sedov3.2 Applicability
424*ae771770SStanislav Sedov
425*ae771770SStanislav Sedov   IDNA is applicable to all domain names in all domain name slots
426*ae771770SStanislav Sedov   except where it is explicitly excluded.
427*ae771770SStanislav Sedov
428*ae771770SStanislav Sedov   This implies that IDNA is applicable to many protocols that predate
429*ae771770SStanislav Sedov   IDNA.  Note that IDNs occupying domain name slots in those protocols
430*ae771770SStanislav Sedov   MUST be in ASCII form (see section 3.1, requirement 2).
431*ae771770SStanislav Sedov
432*ae771770SStanislav Sedov3.2.1. DNS resource records
433*ae771770SStanislav Sedov
434*ae771770SStanislav Sedov   IDNA does not apply to domain names in the NAME and RDATA fields of
435*ae771770SStanislav Sedov   DNS resource records whose CLASS is not IN.  This exclusion applies
436*ae771770SStanislav Sedov   to every non-IN class, present and future, except where future
437*ae771770SStanislav Sedov   standards override this exclusion by explicitly inviting the use of
438*ae771770SStanislav Sedov   IDNA.
439*ae771770SStanislav Sedov
440*ae771770SStanislav Sedov   There are currently no other exclusions on the applicability of IDNA
441*ae771770SStanislav Sedov   to DNS resource records; it depends entirely on the CLASS, and not on
442*ae771770SStanislav Sedov   the TYPE.  This will remain true, even as new types are defined,
443*ae771770SStanislav Sedov   unless there is a compelling reason for a new type to complicate
444*ae771770SStanislav Sedov   matters by imposing type-specific rules.
445*ae771770SStanislav Sedov
446*ae771770SStanislav Sedov
447*ae771770SStanislav Sedov
448*ae771770SStanislav Sedov
449*ae771770SStanislav Sedov
450*ae771770SStanislav SedovFaltstrom, et al.           Standards Track                     [Page 8]
451*ae771770SStanislav Sedov
452*ae771770SStanislav SedovRFC 3490                          IDNA                        March 2003
453*ae771770SStanislav Sedov
454*ae771770SStanislav Sedov
455*ae771770SStanislav Sedov3.2.2. Non-domain-name data types stored in domain names
456*ae771770SStanislav Sedov
457*ae771770SStanislav Sedov   Although IDNA enables the representation of non-ASCII characters in
458*ae771770SStanislav Sedov   domain names, that does not imply that IDNA enables the
459*ae771770SStanislav Sedov   representation of non-ASCII characters in other data types that are
460*ae771770SStanislav Sedov   stored in domain names.  For example, an email address local part is
461*ae771770SStanislav Sedov   sometimes stored in a domain label (hostmaster@example.com would be
462*ae771770SStanislav Sedov   represented as hostmaster.example.com in the RDATA field of an SOA
463*ae771770SStanislav Sedov   record).  IDNA does not update the existing email standards, which
464*ae771770SStanislav Sedov   allow only ASCII characters in local parts.  Therefore, unless the
465*ae771770SStanislav Sedov   email standards are revised to invite the use of IDNA for local
466*ae771770SStanislav Sedov   parts, a domain label that holds the local part of an email address
467*ae771770SStanislav Sedov   SHOULD NOT begin with the ACE prefix, and even if it does, it is to
468*ae771770SStanislav Sedov   be interpreted literally as a local part that happens to begin with
469*ae771770SStanislav Sedov   the ACE prefix.
470*ae771770SStanislav Sedov
471*ae771770SStanislav Sedov4. Conversion operations
472*ae771770SStanislav Sedov
473*ae771770SStanislav Sedov   An application converts a domain name put into an IDN-unaware slot or
474*ae771770SStanislav Sedov   displayed to a user.  This section specifies the steps to perform in
475*ae771770SStanislav Sedov   the conversion, and the ToASCII and ToUnicode operations.
476*ae771770SStanislav Sedov
477*ae771770SStanislav Sedov   The input to ToASCII or ToUnicode is a single label that is a
478*ae771770SStanislav Sedov   sequence of Unicode code points (remember that all ASCII code points
479*ae771770SStanislav Sedov   are also Unicode code points).  If a domain name is represented using
480*ae771770SStanislav Sedov   a character set other than Unicode or US-ASCII, it will first need to
481*ae771770SStanislav Sedov   be transcoded to Unicode.
482*ae771770SStanislav Sedov
483*ae771770SStanislav Sedov   Starting from a whole domain name, the steps that an application
484*ae771770SStanislav Sedov   takes to do the conversions are:
485*ae771770SStanislav Sedov
486*ae771770SStanislav Sedov   1) Decide whether the domain name is a "stored string" or a "query
487*ae771770SStanislav Sedov      string" as described in [STRINGPREP].  If this conversion follows
488*ae771770SStanislav Sedov      the "queries" rule from [STRINGPREP], set the flag called
489*ae771770SStanislav Sedov      "AllowUnassigned".
490*ae771770SStanislav Sedov
491*ae771770SStanislav Sedov   2) Split the domain name into individual labels as described in
492*ae771770SStanislav Sedov      section 3.1.  The labels do not include the separator.
493*ae771770SStanislav Sedov
494*ae771770SStanislav Sedov   3) For each label, decide whether or not to enforce the restrictions
495*ae771770SStanislav Sedov      on ASCII characters in host names [STD3].  (Applications already
496*ae771770SStanislav Sedov      faced this choice before the introduction of IDNA, and can
497*ae771770SStanislav Sedov      continue to make the decision the same way they always have; IDNA
498*ae771770SStanislav Sedov      makes no new recommendations regarding this choice.)  If the
499*ae771770SStanislav Sedov      restrictions are to be enforced, set the flag called
500*ae771770SStanislav Sedov      "UseSTD3ASCIIRules" for that label.
501*ae771770SStanislav Sedov
502*ae771770SStanislav Sedov
503*ae771770SStanislav Sedov
504*ae771770SStanislav Sedov
505*ae771770SStanislav Sedov
506*ae771770SStanislav SedovFaltstrom, et al.           Standards Track                     [Page 9]
507*ae771770SStanislav Sedov
508*ae771770SStanislav SedovRFC 3490                          IDNA                        March 2003
509*ae771770SStanislav Sedov
510*ae771770SStanislav Sedov
511*ae771770SStanislav Sedov   4) Process each label with either the ToASCII or the ToUnicode
512*ae771770SStanislav Sedov      operation as appropriate.  Typically, you use the ToASCII
513*ae771770SStanislav Sedov      operation if you are about to put the name into an IDN-unaware
514*ae771770SStanislav Sedov      slot, and you use the ToUnicode operation if you are displaying
515*ae771770SStanislav Sedov      the name to a user; section 3.1 gives greater detail on the
516*ae771770SStanislav Sedov      applicable requirements.
517*ae771770SStanislav Sedov
518*ae771770SStanislav Sedov   5) If ToASCII was applied in step 4 and dots are used as label
519*ae771770SStanislav Sedov      separators, change all the label separators to U+002E (full stop).
520*ae771770SStanislav Sedov
521*ae771770SStanislav Sedov   The following two subsections define the ToASCII and ToUnicode
522*ae771770SStanislav Sedov   operations that are used in step 4.
523*ae771770SStanislav Sedov
524*ae771770SStanislav Sedov   This description of the protocol uses specific procedure names, names
525*ae771770SStanislav Sedov   of flags, and so on, in order to facilitate the specification of the
526*ae771770SStanislav Sedov   protocol.  These names, as well as the actual steps of the
527*ae771770SStanislav Sedov   procedures, are not required of an implementation.  In fact, any
528*ae771770SStanislav Sedov   implementation which has the same external behavior as specified in
529*ae771770SStanislav Sedov   this document conforms to this specification.
530*ae771770SStanislav Sedov
531*ae771770SStanislav Sedov4.1 ToASCII
532*ae771770SStanislav Sedov
533*ae771770SStanislav Sedov   The ToASCII operation takes a sequence of Unicode code points that
534*ae771770SStanislav Sedov   make up one label and transforms it into a sequence of code points in
535*ae771770SStanislav Sedov   the ASCII range (0..7F).  If ToASCII succeeds, the original sequence
536*ae771770SStanislav Sedov   and the resulting sequence are equivalent labels.
537*ae771770SStanislav Sedov
538*ae771770SStanislav Sedov   It is important to note that the ToASCII operation can fail.  ToASCII
539*ae771770SStanislav Sedov   fails if any step of it fails.  If any step of the ToASCII operation
540*ae771770SStanislav Sedov   fails on any label in a domain name, that domain name MUST NOT be
541*ae771770SStanislav Sedov   used as an internationalized domain name.  The method for dealing
542*ae771770SStanislav Sedov   with this failure is application-specific.
543*ae771770SStanislav Sedov
544*ae771770SStanislav Sedov   The inputs to ToASCII are a sequence of code points, the
545*ae771770SStanislav Sedov   AllowUnassigned flag, and the UseSTD3ASCIIRules flag.  The output of
546*ae771770SStanislav Sedov   ToASCII is either a sequence of ASCII code points or a failure
547*ae771770SStanislav Sedov   condition.
548*ae771770SStanislav Sedov
549*ae771770SStanislav Sedov   ToASCII never alters a sequence of code points that are all in the
550*ae771770SStanislav Sedov   ASCII range to begin with (although it could fail).  Applying the
551*ae771770SStanislav Sedov   ToASCII operation multiple times has exactly the same effect as
552*ae771770SStanislav Sedov   applying it just once.
553*ae771770SStanislav Sedov
554*ae771770SStanislav Sedov   ToASCII consists of the following steps:
555*ae771770SStanislav Sedov
556*ae771770SStanislav Sedov   1. If the sequence contains any code points outside the ASCII range
557*ae771770SStanislav Sedov      (0..7F) then proceed to step 2, otherwise skip to step 3.
558*ae771770SStanislav Sedov
559*ae771770SStanislav Sedov
560*ae771770SStanislav Sedov
561*ae771770SStanislav Sedov
562*ae771770SStanislav SedovFaltstrom, et al.           Standards Track                    [Page 10]
563*ae771770SStanislav Sedov
564*ae771770SStanislav SedovRFC 3490                          IDNA                        March 2003
565*ae771770SStanislav Sedov
566*ae771770SStanislav Sedov
567*ae771770SStanislav Sedov   2. Perform the steps specified in [NAMEPREP] and fail if there is an
568*ae771770SStanislav Sedov      error.  The AllowUnassigned flag is used in [NAMEPREP].
569*ae771770SStanislav Sedov
570*ae771770SStanislav Sedov   3. If the UseSTD3ASCIIRules flag is set, then perform these checks:
571*ae771770SStanislav Sedov
572*ae771770SStanislav Sedov     (a) Verify the absence of non-LDH ASCII code points; that is, the
573*ae771770SStanislav Sedov         absence of 0..2C, 2E..2F, 3A..40, 5B..60, and 7B..7F.
574*ae771770SStanislav Sedov
575*ae771770SStanislav Sedov     (b) Verify the absence of leading and trailing hyphen-minus; that
576*ae771770SStanislav Sedov         is, the absence of U+002D at the beginning and end of the
577*ae771770SStanislav Sedov         sequence.
578*ae771770SStanislav Sedov
579*ae771770SStanislav Sedov   4. If the sequence contains any code points outside the ASCII range
580*ae771770SStanislav Sedov      (0..7F) then proceed to step 5, otherwise skip to step 8.
581*ae771770SStanislav Sedov
582*ae771770SStanislav Sedov   5. Verify that the sequence does NOT begin with the ACE prefix.
583*ae771770SStanislav Sedov
584*ae771770SStanislav Sedov   6. Encode the sequence using the encoding algorithm in [PUNYCODE] and
585*ae771770SStanislav Sedov      fail if there is an error.
586*ae771770SStanislav Sedov
587*ae771770SStanislav Sedov   7. Prepend the ACE prefix.
588*ae771770SStanislav Sedov
589*ae771770SStanislav Sedov   8. Verify that the number of code points is in the range 1 to 63
590*ae771770SStanislav Sedov      inclusive.
591*ae771770SStanislav Sedov
592*ae771770SStanislav Sedov4.2 ToUnicode
593*ae771770SStanislav Sedov
594*ae771770SStanislav Sedov   The ToUnicode operation takes a sequence of Unicode code points that
595*ae771770SStanislav Sedov   make up one label and returns a sequence of Unicode code points.  If
596*ae771770SStanislav Sedov   the input sequence is a label in ACE form, then the result is an
597*ae771770SStanislav Sedov   equivalent internationalized label that is not in ACE form, otherwise
598*ae771770SStanislav Sedov   the original sequence is returned unaltered.
599*ae771770SStanislav Sedov
600*ae771770SStanislav Sedov   ToUnicode never fails.  If any step fails, then the original input
601*ae771770SStanislav Sedov   sequence is returned immediately in that step.
602*ae771770SStanislav Sedov
603*ae771770SStanislav Sedov   The ToUnicode output never contains more code points than its input.
604*ae771770SStanislav Sedov   Note that the number of octets needed to represent a sequence of code
605*ae771770SStanislav Sedov   points depends on the particular character encoding used.
606*ae771770SStanislav Sedov
607*ae771770SStanislav Sedov   The inputs to ToUnicode are a sequence of code points, the
608*ae771770SStanislav Sedov   AllowUnassigned flag, and the UseSTD3ASCIIRules flag.  The output of
609*ae771770SStanislav Sedov   ToUnicode is always a sequence of Unicode code points.
610*ae771770SStanislav Sedov
611*ae771770SStanislav Sedov   1. If all code points in the sequence are in the ASCII range (0..7F)
612*ae771770SStanislav Sedov      then skip to step 3.
613*ae771770SStanislav Sedov
614*ae771770SStanislav Sedov
615*ae771770SStanislav Sedov
616*ae771770SStanislav Sedov
617*ae771770SStanislav Sedov
618*ae771770SStanislav SedovFaltstrom, et al.           Standards Track                    [Page 11]
619*ae771770SStanislav Sedov
620*ae771770SStanislav SedovRFC 3490                          IDNA                        March 2003
621*ae771770SStanislav Sedov
622*ae771770SStanislav Sedov
623*ae771770SStanislav Sedov   2. Perform the steps specified in [NAMEPREP] and fail if there is an
624*ae771770SStanislav Sedov      error.  (If step 3 of ToASCII is also performed here, it will not
625*ae771770SStanislav Sedov      affect the overall behavior of ToUnicode, but it is not
626*ae771770SStanislav Sedov      necessary.)  The AllowUnassigned flag is used in [NAMEPREP].
627*ae771770SStanislav Sedov
628*ae771770SStanislav Sedov   3. Verify that the sequence begins with the ACE prefix, and save a
629*ae771770SStanislav Sedov      copy of the sequence.
630*ae771770SStanislav Sedov
631*ae771770SStanislav Sedov   4. Remove the ACE prefix.
632*ae771770SStanislav Sedov
633*ae771770SStanislav Sedov   5. Decode the sequence using the decoding algorithm in [PUNYCODE] and
634*ae771770SStanislav Sedov      fail if there is an error.  Save a copy of the result of this
635*ae771770SStanislav Sedov      step.
636*ae771770SStanislav Sedov
637*ae771770SStanislav Sedov   6. Apply ToASCII.
638*ae771770SStanislav Sedov
639*ae771770SStanislav Sedov   7. Verify that the result of step 6 matches the saved copy from step
640*ae771770SStanislav Sedov      3, using a case-insensitive ASCII comparison.
641*ae771770SStanislav Sedov
642*ae771770SStanislav Sedov   8. Return the saved copy from step 5.
643*ae771770SStanislav Sedov
644*ae771770SStanislav Sedov5. ACE prefix
645*ae771770SStanislav Sedov
646*ae771770SStanislav Sedov   The ACE prefix, used in the conversion operations (section 4), is two
647*ae771770SStanislav Sedov   alphanumeric ASCII characters followed by two hyphen-minuses.  It
648*ae771770SStanislav Sedov   cannot be any of the prefixes already used in earlier documents,
649*ae771770SStanislav Sedov   which includes the following: "bl--", "bq--", "dq--", "lq--", "mq--",
650*ae771770SStanislav Sedov   "ra--", "wq--" and "zq--".  The ToASCII and ToUnicode operations MUST
651*ae771770SStanislav Sedov   recognize the ACE prefix in a case-insensitive manner.
652*ae771770SStanislav Sedov
653*ae771770SStanislav Sedov   The ACE prefix for IDNA is "xn--" or any capitalization thereof.
654*ae771770SStanislav Sedov
655*ae771770SStanislav Sedov   This means that an ACE label might be "xn--de-jg4avhby1noc0d", where
656*ae771770SStanislav Sedov   "de-jg4avhby1noc0d" is the part of the ACE label that is generated by
657*ae771770SStanislav Sedov   the encoding steps in [PUNYCODE].
658*ae771770SStanislav Sedov
659*ae771770SStanislav Sedov   While all ACE labels begin with the ACE prefix, not all labels
660*ae771770SStanislav Sedov   beginning with the ACE prefix are necessarily ACE labels.  Non-ACE
661*ae771770SStanislav Sedov   labels that begin with the ACE prefix will confuse users and SHOULD
662*ae771770SStanislav Sedov   NOT be allowed in DNS zones.
663*ae771770SStanislav Sedov
664*ae771770SStanislav Sedov
665*ae771770SStanislav Sedov
666*ae771770SStanislav Sedov
667*ae771770SStanislav Sedov
668*ae771770SStanislav Sedov
669*ae771770SStanislav Sedov
670*ae771770SStanislav Sedov
671*ae771770SStanislav Sedov
672*ae771770SStanislav Sedov
673*ae771770SStanislav Sedov
674*ae771770SStanislav SedovFaltstrom, et al.           Standards Track                    [Page 12]
675*ae771770SStanislav Sedov
676*ae771770SStanislav SedovRFC 3490                          IDNA                        March 2003
677*ae771770SStanislav Sedov
678*ae771770SStanislav Sedov
679*ae771770SStanislav Sedov6. Implications for typical applications using DNS
680*ae771770SStanislav Sedov
681*ae771770SStanislav Sedov   In IDNA, applications perform the processing needed to input
682*ae771770SStanislav Sedov   internationalized domain names from users, display internationalized
683*ae771770SStanislav Sedov   domain names to users, and process the inputs and outputs from DNS
684*ae771770SStanislav Sedov   and other protocols that carry domain names.
685*ae771770SStanislav Sedov
686*ae771770SStanislav Sedov   The components and interfaces between them can be represented
687*ae771770SStanislav Sedov   pictorially as:
688*ae771770SStanislav Sedov
689*ae771770SStanislav Sedov                    +------+
690*ae771770SStanislav Sedov                    | User |
691*ae771770SStanislav Sedov                    +------+
692*ae771770SStanislav Sedov                       ^
693*ae771770SStanislav Sedov                       | Input and display: local interface methods
694*ae771770SStanislav Sedov                       | (pen, keyboard, glowing phosphorus, ...)
695*ae771770SStanislav Sedov   +-------------------|-------------------------------+
696*ae771770SStanislav Sedov   |                   v                               |
697*ae771770SStanislav Sedov   |          +-----------------------------+          |
698*ae771770SStanislav Sedov   |          |        Application          |          |
699*ae771770SStanislav Sedov   |          |   (ToASCII and ToUnicode    |          |
700*ae771770SStanislav Sedov   |          |      operations may be      |          |
701*ae771770SStanislav Sedov   |          |        called here)         |          |
702*ae771770SStanislav Sedov   |          +-----------------------------+          |
703*ae771770SStanislav Sedov   |                   ^        ^                      | End system
704*ae771770SStanislav Sedov   |                   |        |                      |
705*ae771770SStanislav Sedov   | Call to resolver: |        | Application-specific |
706*ae771770SStanislav Sedov   |              ACE  |        | protocol:            |
707*ae771770SStanislav Sedov   |                   v        | ACE unless the       |
708*ae771770SStanislav Sedov   |           +----------+     | protocol is updated  |
709*ae771770SStanislav Sedov   |           | Resolver |     | to handle other      |
710*ae771770SStanislav Sedov   |           +----------+     | encodings            |
711*ae771770SStanislav Sedov   |                 ^          |                      |
712*ae771770SStanislav Sedov   +-----------------|----------|----------------------+
713*ae771770SStanislav Sedov       DNS protocol: |          |
714*ae771770SStanislav Sedov                 ACE |          |
715*ae771770SStanislav Sedov                     v          v
716*ae771770SStanislav Sedov          +-------------+    +---------------------+
717*ae771770SStanislav Sedov          | DNS servers |    | Application servers |
718*ae771770SStanislav Sedov          +-------------+    +---------------------+
719*ae771770SStanislav Sedov
720*ae771770SStanislav Sedov   The box labeled "Application" is where the application splits a
721*ae771770SStanislav Sedov   domain name into labels, sets the appropriate flags, and performs the
722*ae771770SStanislav Sedov   ToASCII and ToUnicode operations.  This is described in section 4.
723*ae771770SStanislav Sedov
724*ae771770SStanislav Sedov
725*ae771770SStanislav Sedov
726*ae771770SStanislav Sedov
727*ae771770SStanislav Sedov
728*ae771770SStanislav Sedov
729*ae771770SStanislav Sedov
730*ae771770SStanislav SedovFaltstrom, et al.           Standards Track                    [Page 13]
731*ae771770SStanislav Sedov
732*ae771770SStanislav SedovRFC 3490                          IDNA                        March 2003
733*ae771770SStanislav Sedov
734*ae771770SStanislav Sedov
735*ae771770SStanislav Sedov6.1 Entry and display in applications
736*ae771770SStanislav Sedov
737*ae771770SStanislav Sedov   Applications can accept domain names using any character set or sets
738*ae771770SStanislav Sedov   desired by the application developer, and can display domain names in
739*ae771770SStanislav Sedov   any charset.  That is, the IDNA protocol does not affect the
740*ae771770SStanislav Sedov   interface between users and applications.
741*ae771770SStanislav Sedov
742*ae771770SStanislav Sedov   An IDNA-aware application can accept and display internationalized
743*ae771770SStanislav Sedov   domain names in two formats: the internationalized character set(s)
744*ae771770SStanislav Sedov   supported by the application, and as an ACE label.  ACE labels that
745*ae771770SStanislav Sedov   are displayed or input MUST always include the ACE prefix.
746*ae771770SStanislav Sedov   Applications MAY allow input and display of ACE labels, but are not
747*ae771770SStanislav Sedov   encouraged to do so except as an interface for special purposes,
748*ae771770SStanislav Sedov   possibly for debugging, or to cope with display limitations as
749*ae771770SStanislav Sedov   described in section 6.4..  ACE encoding is opaque and ugly, and
750*ae771770SStanislav Sedov   should thus only be exposed to users who absolutely need it.  Because
751*ae771770SStanislav Sedov   name labels encoded as ACE name labels can be rendered either as the
752*ae771770SStanislav Sedov   encoded ASCII characters or the proper decoded characters, the
753*ae771770SStanislav Sedov   application MAY have an option for the user to select the preferred
754*ae771770SStanislav Sedov   method of display; if it does, rendering the ACE SHOULD NOT be the
755*ae771770SStanislav Sedov   default.
756*ae771770SStanislav Sedov
757*ae771770SStanislav Sedov   Domain names are often stored and transported in many places.  For
758*ae771770SStanislav Sedov   example, they are part of documents such as mail messages and web
759*ae771770SStanislav Sedov   pages.  They are transported in many parts of many protocols, such as
760*ae771770SStanislav Sedov   both the control commands and the RFC 2822 body parts of SMTP, and
761*ae771770SStanislav Sedov   the headers and the body content in HTTP.  It is important to
762*ae771770SStanislav Sedov   remember that domain names appear both in domain name slots and in
763*ae771770SStanislav Sedov   the content that is passed over protocols.
764*ae771770SStanislav Sedov
765*ae771770SStanislav Sedov   In protocols and document formats that define how to handle
766*ae771770SStanislav Sedov   specification or negotiation of charsets, labels can be encoded in
767*ae771770SStanislav Sedov   any charset allowed by the protocol or document format.  If a
768*ae771770SStanislav Sedov   protocol or document format only allows one charset, the labels MUST
769*ae771770SStanislav Sedov   be given in that charset.
770*ae771770SStanislav Sedov
771*ae771770SStanislav Sedov   In any place where a protocol or document format allows transmission
772*ae771770SStanislav Sedov   of the characters in internationalized labels, internationalized
773*ae771770SStanislav Sedov   labels SHOULD be transmitted using whatever character encoding and
774*ae771770SStanislav Sedov   escape mechanism that the protocol or document format uses at that
775*ae771770SStanislav Sedov   place.
776*ae771770SStanislav Sedov
777*ae771770SStanislav Sedov   All protocols that use domain name slots already have the capacity
778*ae771770SStanislav Sedov   for handling domain names in the ASCII charset.  Thus, ACE labels
779*ae771770SStanislav Sedov   (internationalized labels that have been processed with the ToASCII
780*ae771770SStanislav Sedov   operation) can inherently be handled by those protocols.
781*ae771770SStanislav Sedov
782*ae771770SStanislav Sedov
783*ae771770SStanislav Sedov
784*ae771770SStanislav Sedov
785*ae771770SStanislav Sedov
786*ae771770SStanislav SedovFaltstrom, et al.           Standards Track                    [Page 14]
787*ae771770SStanislav Sedov
788*ae771770SStanislav SedovRFC 3490                          IDNA                        March 2003
789*ae771770SStanislav Sedov
790*ae771770SStanislav Sedov
791*ae771770SStanislav Sedov6.2 Applications and resolver libraries
792*ae771770SStanislav Sedov
793*ae771770SStanislav Sedov   Applications normally use functions in the operating system when they
794*ae771770SStanislav Sedov   resolve DNS queries.  Those functions in the operating system are
795*ae771770SStanislav Sedov   often called "the resolver library", and the applications communicate
796*ae771770SStanislav Sedov   with the resolver libraries through a programming interface (API).
797*ae771770SStanislav Sedov
798*ae771770SStanislav Sedov   Because these resolver libraries today expect only domain names in
799*ae771770SStanislav Sedov   ASCII, applications MUST prepare labels that are passed to the
800*ae771770SStanislav Sedov   resolver library using the ToASCII operation.  Labels received from
801*ae771770SStanislav Sedov   the resolver library contain only ASCII characters; internationalized
802*ae771770SStanislav Sedov   labels that cannot be represented directly in ASCII use the ACE form.
803*ae771770SStanislav Sedov   ACE labels always include the ACE prefix.
804*ae771770SStanislav Sedov
805*ae771770SStanislav Sedov   An operating system might have a set of libraries for performing the
806*ae771770SStanislav Sedov   ToASCII operation.  The input to such a library might be in one or
807*ae771770SStanislav Sedov   more charsets that are used in applications (UTF-8 and UTF-16 are
808*ae771770SStanislav Sedov   likely candidates for almost any operating system, and script-
809*ae771770SStanislav Sedov   specific charsets are likely for localized operating systems).
810*ae771770SStanislav Sedov
811*ae771770SStanislav Sedov   IDNA-aware applications MUST be able to work with both non-
812*ae771770SStanislav Sedov   internationalized labels (those that conform to [STD13] and [STD3])
813*ae771770SStanislav Sedov   and internationalized labels.
814*ae771770SStanislav Sedov
815*ae771770SStanislav Sedov   It is expected that new versions of the resolver libraries in the
816*ae771770SStanislav Sedov   future will be able to accept domain names in other charsets than
817*ae771770SStanislav Sedov   ASCII, and application developers might one day pass not only domain
818*ae771770SStanislav Sedov   names in Unicode, but also in local script to a new API for the
819*ae771770SStanislav Sedov   resolver libraries in the operating system.  Thus the ToASCII and
820*ae771770SStanislav Sedov   ToUnicode operations might be performed inside these new versions of
821*ae771770SStanislav Sedov   the resolver libraries.
822*ae771770SStanislav Sedov
823*ae771770SStanislav Sedov   Domain names passed to resolvers or put into the question section of
824*ae771770SStanislav Sedov   DNS requests follow the rules for "queries" from [STRINGPREP].
825*ae771770SStanislav Sedov
826*ae771770SStanislav Sedov6.3 DNS servers
827*ae771770SStanislav Sedov
828*ae771770SStanislav Sedov   Domain names stored in zones follow the rules for "stored strings"
829*ae771770SStanislav Sedov   from [STRINGPREP].
830*ae771770SStanislav Sedov
831*ae771770SStanislav Sedov   For internationalized labels that cannot be represented directly in
832*ae771770SStanislav Sedov   ASCII, DNS servers MUST use the ACE form produced by the ToASCII
833*ae771770SStanislav Sedov   operation.  All IDNs served by DNS servers MUST contain only ASCII
834*ae771770SStanislav Sedov   characters.
835*ae771770SStanislav Sedov
836*ae771770SStanislav Sedov   If a signaling system which makes negotiation possible between old
837*ae771770SStanislav Sedov   and new DNS clients and servers is standardized in the future, the
838*ae771770SStanislav Sedov   encoding of the query in the DNS protocol itself can be changed from
839*ae771770SStanislav Sedov
840*ae771770SStanislav Sedov
841*ae771770SStanislav Sedov
842*ae771770SStanislav SedovFaltstrom, et al.           Standards Track                    [Page 15]
843*ae771770SStanislav Sedov
844*ae771770SStanislav SedovRFC 3490                          IDNA                        March 2003
845*ae771770SStanislav Sedov
846*ae771770SStanislav Sedov
847*ae771770SStanislav Sedov   ACE to something else, such as UTF-8.  The question whether or not
848*ae771770SStanislav Sedov   this should be used is, however, a separate problem and is not
849*ae771770SStanislav Sedov   discussed in this memo.
850*ae771770SStanislav Sedov
851*ae771770SStanislav Sedov6.4 Avoiding exposing users to the raw ACE encoding
852*ae771770SStanislav Sedov
853*ae771770SStanislav Sedov   Any application that might show the user a domain name obtained from
854*ae771770SStanislav Sedov   a domain name slot, such as from gethostbyaddr or part of a mail
855*ae771770SStanislav Sedov   header, will need to be updated if it is to prevent users from seeing
856*ae771770SStanislav Sedov   the ACE.
857*ae771770SStanislav Sedov
858*ae771770SStanislav Sedov   If an application decodes an ACE name using ToUnicode but cannot show
859*ae771770SStanislav Sedov   all of the characters in the decoded name, such as if the name
860*ae771770SStanislav Sedov   contains characters that the output system cannot display, the
861*ae771770SStanislav Sedov   application SHOULD show the name in ACE format (which always includes
862*ae771770SStanislav Sedov   the ACE prefix) instead of displaying the name with the replacement
863*ae771770SStanislav Sedov   character (U+FFFD).  This is to make it easier for the user to
864*ae771770SStanislav Sedov   transfer the name correctly to other programs.  Programs that by
865*ae771770SStanislav Sedov   default show the ACE form when they cannot show all the characters in
866*ae771770SStanislav Sedov   a name label SHOULD also have a mechanism to show the name that is
867*ae771770SStanislav Sedov   produced by the ToUnicode operation with as many characters as
868*ae771770SStanislav Sedov   possible and replacement characters in the positions where characters
869*ae771770SStanislav Sedov   cannot be displayed.
870*ae771770SStanislav Sedov
871*ae771770SStanislav Sedov   The ToUnicode operation does not alter labels that are not valid ACE
872*ae771770SStanislav Sedov   labels, even if they begin with the ACE prefix.  After ToUnicode has
873*ae771770SStanislav Sedov   been applied, if a label still begins with the ACE prefix, then it is
874*ae771770SStanislav Sedov   not a valid ACE label, and is not equivalent to any of the
875*ae771770SStanislav Sedov   intermediate Unicode strings constructed by ToUnicode.
876*ae771770SStanislav Sedov
877*ae771770SStanislav Sedov6.5  DNSSEC authentication of IDN domain names
878*ae771770SStanislav Sedov
879*ae771770SStanislav Sedov   DNS Security [RFC2535] is a method for supplying cryptographic
880*ae771770SStanislav Sedov   verification information along with DNS messages.  Public Key
881*ae771770SStanislav Sedov   Cryptography is used in conjunction with digital signatures to
882*ae771770SStanislav Sedov   provide a means for a requester of domain information to authenticate
883*ae771770SStanislav Sedov   the source of the data.  This ensures that it can be traced back to a
884*ae771770SStanislav Sedov   trusted source, either directly, or via a chain of trust linking the
885*ae771770SStanislav Sedov   source of the information to the top of the DNS hierarchy.
886*ae771770SStanislav Sedov
887*ae771770SStanislav Sedov   IDNA specifies that all internationalized domain names served by DNS
888*ae771770SStanislav Sedov   servers that cannot be represented directly in ASCII must use the ACE
889*ae771770SStanislav Sedov   form produced by the ToASCII operation.  This operation must be
890*ae771770SStanislav Sedov   performed prior to a zone being signed by the private key for that
891*ae771770SStanislav Sedov   zone.  Because of this ordering, it is important to recognize that
892*ae771770SStanislav Sedov   DNSSEC authenticates the ASCII domain name, not the Unicode form or
893*ae771770SStanislav Sedov
894*ae771770SStanislav Sedov
895*ae771770SStanislav Sedov
896*ae771770SStanislav Sedov
897*ae771770SStanislav Sedov
898*ae771770SStanislav SedovFaltstrom, et al.           Standards Track                    [Page 16]
899*ae771770SStanislav Sedov
900*ae771770SStanislav SedovRFC 3490                          IDNA                        March 2003
901*ae771770SStanislav Sedov
902*ae771770SStanislav Sedov
903*ae771770SStanislav Sedov   the mapping between the Unicode form and the ASCII form.  In the
904*ae771770SStanislav Sedov   presence of DNSSEC, this is the name that MUST be signed in the zone
905*ae771770SStanislav Sedov   and MUST be validated against.
906*ae771770SStanislav Sedov
907*ae771770SStanislav Sedov   One consequence of this for sites deploying IDNA in the presence of
908*ae771770SStanislav Sedov   DNSSEC is that any special purpose proxies or forwarders used to
909*ae771770SStanislav Sedov   transform user input into IDNs must be earlier in the resolution flow
910*ae771770SStanislav Sedov   than DNSSEC authenticating nameservers for DNSSEC to work.
911*ae771770SStanislav Sedov
912*ae771770SStanislav Sedov7. Name server considerations
913*ae771770SStanislav Sedov
914*ae771770SStanislav Sedov   Existing DNS servers do not know the IDNA rules for handling non-
915*ae771770SStanislav Sedov   ASCII forms of IDNs, and therefore need to be shielded from them.
916*ae771770SStanislav Sedov   All existing channels through which names can enter a DNS server
917*ae771770SStanislav Sedov   database (for example, master files [STD13] and DNS update messages
918*ae771770SStanislav Sedov   [RFC2136]) are IDN-unaware because they predate IDNA, and therefore
919*ae771770SStanislav Sedov   requirement 2 of section 3.1 of this document provides the needed
920*ae771770SStanislav Sedov   shielding, by ensuring that internationalized domain names entering
921*ae771770SStanislav Sedov   DNS server databases through such channels have already been
922*ae771770SStanislav Sedov   converted to their equivalent ASCII forms.
923*ae771770SStanislav Sedov
924*ae771770SStanislav Sedov   It is imperative that there be only one ASCII encoding for a
925*ae771770SStanislav Sedov   particular domain name.  Because of the design of the ToASCII and
926*ae771770SStanislav Sedov   ToUnicode operations, there are no ACE labels that decode to ASCII
927*ae771770SStanislav Sedov   labels, and therefore name servers cannot contain multiple ASCII
928*ae771770SStanislav Sedov   encodings of the same domain name.
929*ae771770SStanislav Sedov
930*ae771770SStanislav Sedov   [RFC2181] explicitly allows domain labels to contain octets beyond
931*ae771770SStanislav Sedov   the ASCII range (0..7F), and this document does not change that.
932*ae771770SStanislav Sedov   Note, however, that there is no defined interpretation of octets
933*ae771770SStanislav Sedov   80..FF as characters.  If labels containing these octets are returned
934*ae771770SStanislav Sedov   to applications, unpredictable behavior could result.  The ASCII form
935*ae771770SStanislav Sedov   defined by ToASCII is the only standard representation for
936*ae771770SStanislav Sedov   internationalized labels in the current DNS protocol.
937*ae771770SStanislav Sedov
938*ae771770SStanislav Sedov8. Root server considerations
939*ae771770SStanislav Sedov
940*ae771770SStanislav Sedov   IDNs are likely to be somewhat longer than current domain names, so
941*ae771770SStanislav Sedov   the bandwidth needed by the root servers is likely to go up by a
942*ae771770SStanislav Sedov   small amount.  Also, queries and responses for IDNs will probably be
943*ae771770SStanislav Sedov   somewhat longer than typical queries today, so more queries and
944*ae771770SStanislav Sedov   responses may be forced to go to TCP instead of UDP.
945*ae771770SStanislav Sedov
946*ae771770SStanislav Sedov
947*ae771770SStanislav Sedov
948*ae771770SStanislav Sedov
949*ae771770SStanislav Sedov
950*ae771770SStanislav Sedov
951*ae771770SStanislav Sedov
952*ae771770SStanislav Sedov
953*ae771770SStanislav Sedov
954*ae771770SStanislav SedovFaltstrom, et al.           Standards Track                    [Page 17]
955*ae771770SStanislav Sedov
956*ae771770SStanislav SedovRFC 3490                          IDNA                        March 2003
957*ae771770SStanislav Sedov
958*ae771770SStanislav Sedov
959*ae771770SStanislav Sedov9. References
960*ae771770SStanislav Sedov
961*ae771770SStanislav Sedov9.1 Normative References
962*ae771770SStanislav Sedov
963*ae771770SStanislav Sedov   [RFC2119]    Bradner, S., "Key words for use in RFCs to Indicate
964*ae771770SStanislav Sedov                Requirement Levels", BCP 14, RFC 2119, March 1997.
965*ae771770SStanislav Sedov
966*ae771770SStanislav Sedov   [STRINGPREP] Hoffman, P. and M. Blanchet, "Preparation of
967*ae771770SStanislav Sedov                Internationalized Strings ("stringprep")", RFC 3454,
968*ae771770SStanislav Sedov                December 2002.
969*ae771770SStanislav Sedov
970*ae771770SStanislav Sedov   [NAMEPREP]   Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep
971*ae771770SStanislav Sedov                Profile for Internationalized Domain Names (IDN)", RFC
972*ae771770SStanislav Sedov                3491, March 2003.
973*ae771770SStanislav Sedov
974*ae771770SStanislav Sedov   [PUNYCODE]   Costello, A., "Punycode: A Bootstring encoding of
975*ae771770SStanislav Sedov                Unicode for use with Internationalized Domain Names in
976*ae771770SStanislav Sedov                Applications (IDNA)", RFC 3492, March 2003.
977*ae771770SStanislav Sedov
978*ae771770SStanislav Sedov   [STD3]       Braden, R., "Requirements for Internet Hosts --
979*ae771770SStanislav Sedov                Communication Layers", STD 3, RFC 1122, and
980*ae771770SStanislav Sedov                "Requirements for Internet Hosts -- Application and
981*ae771770SStanislav Sedov                Support", STD 3, RFC 1123, October 1989.
982*ae771770SStanislav Sedov
983*ae771770SStanislav Sedov   [STD13]      Mockapetris, P., "Domain names - concepts and
984*ae771770SStanislav Sedov                facilities", STD 13, RFC 1034 and "Domain names -
985*ae771770SStanislav Sedov                implementation and specification", STD 13, RFC 1035,
986*ae771770SStanislav Sedov                November 1987.
987*ae771770SStanislav Sedov
988*ae771770SStanislav Sedov9.2 Informative References
989*ae771770SStanislav Sedov
990*ae771770SStanislav Sedov   [RFC2535]    Eastlake, D., "Domain Name System Security Extensions",
991*ae771770SStanislav Sedov                RFC 2535, March 1999.
992*ae771770SStanislav Sedov
993*ae771770SStanislav Sedov   [RFC2181]    Elz, R. and R. Bush, "Clarifications to the DNS
994*ae771770SStanislav Sedov                Specification", RFC 2181, July 1997.
995*ae771770SStanislav Sedov
996*ae771770SStanislav Sedov   [UAX9]       Unicode Standard Annex #9, The Bidirectional Algorithm,
997*ae771770SStanislav Sedov                <http://www.unicode.org/unicode/reports/tr9/>.
998*ae771770SStanislav Sedov
999*ae771770SStanislav Sedov   [UNICODE]    The Unicode Consortium. The Unicode Standard, Version
1000*ae771770SStanislav Sedov                3.2.0 is defined by The Unicode Standard, Version 3.0
1001*ae771770SStanislav Sedov                (Reading, MA, Addison-Wesley, 2000. ISBN 0-201-61633-5),
1002*ae771770SStanislav Sedov                as amended by the Unicode Standard Annex #27: Unicode
1003*ae771770SStanislav Sedov                3.1 (http://www.unicode.org/reports/tr27/) and by the
1004*ae771770SStanislav Sedov                Unicode Standard Annex #28: Unicode 3.2
1005*ae771770SStanislav Sedov                (http://www.unicode.org/reports/tr28/).
1006*ae771770SStanislav Sedov
1007*ae771770SStanislav Sedov
1008*ae771770SStanislav Sedov
1009*ae771770SStanislav Sedov
1010*ae771770SStanislav SedovFaltstrom, et al.           Standards Track                    [Page 18]
1011*ae771770SStanislav Sedov
1012*ae771770SStanislav SedovRFC 3490                          IDNA                        March 2003
1013*ae771770SStanislav Sedov
1014*ae771770SStanislav Sedov
1015*ae771770SStanislav Sedov   [USASCII]    Cerf, V., "ASCII format for Network Interchange", RFC
1016*ae771770SStanislav Sedov                20, October 1969.
1017*ae771770SStanislav Sedov
1018*ae771770SStanislav Sedov10. Security Considerations
1019*ae771770SStanislav Sedov
1020*ae771770SStanislav Sedov   Security on the Internet partly relies on the DNS.  Thus, any change
1021*ae771770SStanislav Sedov   to the characteristics of the DNS can change the security of much of
1022*ae771770SStanislav Sedov   the Internet.
1023*ae771770SStanislav Sedov
1024*ae771770SStanislav Sedov   This memo describes an algorithm which encodes characters that are
1025*ae771770SStanislav Sedov   not valid according to STD3 and STD13 into octet values that are
1026*ae771770SStanislav Sedov   valid.  No security issues such as string length increases or new
1027*ae771770SStanislav Sedov   allowed values are introduced by the encoding process or the use of
1028*ae771770SStanislav Sedov   these encoded values, apart from those introduced by the ACE encoding
1029*ae771770SStanislav Sedov   itself.
1030*ae771770SStanislav Sedov
1031*ae771770SStanislav Sedov   Domain names are used by users to identify and connect to Internet
1032*ae771770SStanislav Sedov   servers.  The security of the Internet is compromised if a user
1033*ae771770SStanislav Sedov   entering a single internationalized name is connected to different
1034*ae771770SStanislav Sedov   servers based on different interpretations of the internationalized
1035*ae771770SStanislav Sedov   domain name.
1036*ae771770SStanislav Sedov
1037*ae771770SStanislav Sedov   When systems use local character sets other than ASCII and Unicode,
1038*ae771770SStanislav Sedov   this specification leaves the the problem of transcoding between the
1039*ae771770SStanislav Sedov   local character set and Unicode up to the application.  If different
1040*ae771770SStanislav Sedov   applications (or different versions of one application) implement
1041*ae771770SStanislav Sedov   different transcoding rules, they could interpret the same name
1042*ae771770SStanislav Sedov   differently and contact different servers.  This problem is not
1043*ae771770SStanislav Sedov   solved by security protocols like TLS that do not take local
1044*ae771770SStanislav Sedov   character sets into account.
1045*ae771770SStanislav Sedov
1046*ae771770SStanislav Sedov   Because this document normatively refers to [NAMEPREP], [PUNYCODE],
1047*ae771770SStanislav Sedov   and [STRINGPREP], it includes the security considerations from those
1048*ae771770SStanislav Sedov   documents as well.
1049*ae771770SStanislav Sedov
1050*ae771770SStanislav Sedov   If or when this specification is updated to use a more recent Unicode
1051*ae771770SStanislav Sedov   normalization table, the new normalization table will need to be
1052*ae771770SStanislav Sedov   compared with the old to spot backwards incompatible changes.  If
1053*ae771770SStanislav Sedov   there are such changes, they will need to be handled somehow, or
1054*ae771770SStanislav Sedov   there will be security as well as operational implications.  Methods
1055*ae771770SStanislav Sedov   to handle the conflicts could include keeping the old normalization,
1056*ae771770SStanislav Sedov   or taking care of the conflicting characters by operational means, or
1057*ae771770SStanislav Sedov   some other method.
1058*ae771770SStanislav Sedov
1059*ae771770SStanislav Sedov   Implementations MUST NOT use more recent normalization tables than
1060*ae771770SStanislav Sedov   the one referenced from this document, even though more recent tables
1061*ae771770SStanislav Sedov   may be provided by operating systems.  If an application is unsure of
1062*ae771770SStanislav Sedov   which version of the normalization tables are in the operating
1063*ae771770SStanislav Sedov
1064*ae771770SStanislav Sedov
1065*ae771770SStanislav Sedov
1066*ae771770SStanislav SedovFaltstrom, et al.           Standards Track                    [Page 19]
1067*ae771770SStanislav Sedov
1068*ae771770SStanislav SedovRFC 3490                          IDNA                        March 2003
1069*ae771770SStanislav Sedov
1070*ae771770SStanislav Sedov
1071*ae771770SStanislav Sedov   system, the application needs to include the normalization tables
1072*ae771770SStanislav Sedov   itself.  Using normalization tables other than the one referenced
1073*ae771770SStanislav Sedov   from this specification could have security and operational
1074*ae771770SStanislav Sedov   implications.
1075*ae771770SStanislav Sedov
1076*ae771770SStanislav Sedov   To help prevent confusion between characters that are visually
1077*ae771770SStanislav Sedov   similar, it is suggested that implementations provide visual
1078*ae771770SStanislav Sedov   indications where a domain name contains multiple scripts.  Such
1079*ae771770SStanislav Sedov   mechanisms can also be used to show when a name contains a mixture of
1080*ae771770SStanislav Sedov   simplified and traditional Chinese characters, or to distinguish zero
1081*ae771770SStanislav Sedov   and one from O and l.  DNS zone adminstrators may impose restrictions
1082*ae771770SStanislav Sedov   (subject to the limitations in section 2) that try to minimize
1083*ae771770SStanislav Sedov   homographs.
1084*ae771770SStanislav Sedov
1085*ae771770SStanislav Sedov   Domain names (or portions of them) are sometimes compared against a
1086*ae771770SStanislav Sedov   set of privileged or anti-privileged domains.  In such situations it
1087*ae771770SStanislav Sedov   is especially important that the comparisons be done properly, as
1088*ae771770SStanislav Sedov   specified in section 3.1 requirement 4.  For labels already in ASCII
1089*ae771770SStanislav Sedov   form, the proper comparison reduces to the same case-insensitive
1090*ae771770SStanislav Sedov   ASCII comparison that has always been used for ASCII labels.
1091*ae771770SStanislav Sedov
1092*ae771770SStanislav Sedov   The introduction of IDNA means that any existing labels that start
1093*ae771770SStanislav Sedov   with the ACE prefix and would be altered by ToUnicode will
1094*ae771770SStanislav Sedov   automatically be ACE labels, and will be considered equivalent to
1095*ae771770SStanislav Sedov   non-ASCII labels, whether or not that was the intent of the zone
1096*ae771770SStanislav Sedov   adminstrator or registrant.
1097*ae771770SStanislav Sedov
1098*ae771770SStanislav Sedov11. IANA Considerations
1099*ae771770SStanislav Sedov
1100*ae771770SStanislav Sedov   IANA has assigned the ACE prefix in consultation with the IESG.
1101*ae771770SStanislav Sedov
1102*ae771770SStanislav Sedov
1103*ae771770SStanislav Sedov
1104*ae771770SStanislav Sedov
1105*ae771770SStanislav Sedov
1106*ae771770SStanislav Sedov
1107*ae771770SStanislav Sedov
1108*ae771770SStanislav Sedov
1109*ae771770SStanislav Sedov
1110*ae771770SStanislav Sedov
1111*ae771770SStanislav Sedov
1112*ae771770SStanislav Sedov
1113*ae771770SStanislav Sedov
1114*ae771770SStanislav Sedov
1115*ae771770SStanislav Sedov
1116*ae771770SStanislav Sedov
1117*ae771770SStanislav Sedov
1118*ae771770SStanislav Sedov
1119*ae771770SStanislav Sedov
1120*ae771770SStanislav Sedov
1121*ae771770SStanislav Sedov
1122*ae771770SStanislav SedovFaltstrom, et al.           Standards Track                    [Page 20]
1123*ae771770SStanislav Sedov
1124*ae771770SStanislav SedovRFC 3490                          IDNA                        March 2003
1125*ae771770SStanislav Sedov
1126*ae771770SStanislav Sedov
1127*ae771770SStanislav Sedov12. Authors' Addresses
1128*ae771770SStanislav Sedov
1129*ae771770SStanislav Sedov   Patrik Faltstrom
1130*ae771770SStanislav Sedov   Cisco Systems
1131*ae771770SStanislav Sedov   Arstaangsvagen 31 J
1132*ae771770SStanislav Sedov   S-117 43 Stockholm  Sweden
1133*ae771770SStanislav Sedov
1134*ae771770SStanislav Sedov   EMail: paf@cisco.com
1135*ae771770SStanislav Sedov
1136*ae771770SStanislav Sedov
1137*ae771770SStanislav Sedov   Paul Hoffman
1138*ae771770SStanislav Sedov   Internet Mail Consortium and VPN Consortium
1139*ae771770SStanislav Sedov   127 Segre Place
1140*ae771770SStanislav Sedov   Santa Cruz, CA  95060  USA
1141*ae771770SStanislav Sedov
1142*ae771770SStanislav Sedov   EMail: phoffman@imc.org
1143*ae771770SStanislav Sedov
1144*ae771770SStanislav Sedov
1145*ae771770SStanislav Sedov   Adam M. Costello
1146*ae771770SStanislav Sedov   University of California, Berkeley
1147*ae771770SStanislav Sedov
1148*ae771770SStanislav Sedov   URL: http://www.nicemice.net/amc/
1149*ae771770SStanislav Sedov
1150*ae771770SStanislav Sedov
1151*ae771770SStanislav Sedov
1152*ae771770SStanislav Sedov
1153*ae771770SStanislav Sedov
1154*ae771770SStanislav Sedov
1155*ae771770SStanislav Sedov
1156*ae771770SStanislav Sedov
1157*ae771770SStanislav Sedov
1158*ae771770SStanislav Sedov
1159*ae771770SStanislav Sedov
1160*ae771770SStanislav Sedov
1161*ae771770SStanislav Sedov
1162*ae771770SStanislav Sedov
1163*ae771770SStanislav Sedov
1164*ae771770SStanislav Sedov
1165*ae771770SStanislav Sedov
1166*ae771770SStanislav Sedov
1167*ae771770SStanislav Sedov
1168*ae771770SStanislav Sedov
1169*ae771770SStanislav Sedov
1170*ae771770SStanislav Sedov
1171*ae771770SStanislav Sedov
1172*ae771770SStanislav Sedov
1173*ae771770SStanislav Sedov
1174*ae771770SStanislav Sedov
1175*ae771770SStanislav Sedov
1176*ae771770SStanislav Sedov
1177*ae771770SStanislav Sedov
1178*ae771770SStanislav SedovFaltstrom, et al.           Standards Track                    [Page 21]
1179*ae771770SStanislav Sedov
1180*ae771770SStanislav SedovRFC 3490                          IDNA                        March 2003
1181*ae771770SStanislav Sedov
1182*ae771770SStanislav Sedov
1183*ae771770SStanislav Sedov13. Full Copyright Statement
1184*ae771770SStanislav Sedov
1185*ae771770SStanislav Sedov   Copyright (C) The Internet Society (2003).  All Rights Reserved.
1186*ae771770SStanislav Sedov
1187*ae771770SStanislav Sedov   This document and translations of it may be copied and furnished to
1188*ae771770SStanislav Sedov   others, and derivative works that comment on or otherwise explain it
1189*ae771770SStanislav Sedov   or assist in its implementation may be prepared, copied, published
1190*ae771770SStanislav Sedov   and distributed, in whole or in part, without restriction of any
1191*ae771770SStanislav Sedov   kind, provided that the above copyright notice and this paragraph are
1192*ae771770SStanislav Sedov   included on all such copies and derivative works.  However, this
1193*ae771770SStanislav Sedov   document itself may not be modified in any way, such as by removing
1194*ae771770SStanislav Sedov   the copyright notice or references to the Internet Society or other
1195*ae771770SStanislav Sedov   Internet organizations, except as needed for the purpose of
1196*ae771770SStanislav Sedov   developing Internet standards in which case the procedures for
1197*ae771770SStanislav Sedov   copyrights defined in the Internet Standards process must be
1198*ae771770SStanislav Sedov   followed, or as required to translate it into languages other than
1199*ae771770SStanislav Sedov   English.
1200*ae771770SStanislav Sedov
1201*ae771770SStanislav Sedov   The limited permissions granted above are perpetual and will not be
1202*ae771770SStanislav Sedov   revoked by the Internet Society or its successors or assigns.
1203*ae771770SStanislav Sedov
1204*ae771770SStanislav Sedov   This document and the information contained herein is provided on an
1205*ae771770SStanislav Sedov   "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
1206*ae771770SStanislav Sedov   TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
1207*ae771770SStanislav Sedov   BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
1208*ae771770SStanislav Sedov   HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
1209*ae771770SStanislav Sedov   MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
1210*ae771770SStanislav Sedov
1211*ae771770SStanislav SedovAcknowledgement
1212*ae771770SStanislav Sedov
1213*ae771770SStanislav Sedov   Funding for the RFC Editor function is currently provided by the
1214*ae771770SStanislav Sedov   Internet Society.
1215*ae771770SStanislav Sedov
1216*ae771770SStanislav Sedov
1217*ae771770SStanislav Sedov
1218*ae771770SStanislav Sedov
1219*ae771770SStanislav Sedov
1220*ae771770SStanislav Sedov
1221*ae771770SStanislav Sedov
1222*ae771770SStanislav Sedov
1223*ae771770SStanislav Sedov
1224*ae771770SStanislav Sedov
1225*ae771770SStanislav Sedov
1226*ae771770SStanislav Sedov
1227*ae771770SStanislav Sedov
1228*ae771770SStanislav Sedov
1229*ae771770SStanislav Sedov
1230*ae771770SStanislav Sedov
1231*ae771770SStanislav Sedov
1232*ae771770SStanislav Sedov
1233*ae771770SStanislav Sedov
1234*ae771770SStanislav SedovFaltstrom, et al.           Standards Track                    [Page 22]
1235*ae771770SStanislav Sedov
1236