1*ae771770SStanislav Sedov 2*ae771770SStanislav Sedov 3*ae771770SStanislav Sedov 4*ae771770SStanislav Sedov 5*ae771770SStanislav Sedov 6*ae771770SStanislav Sedov 7*ae771770SStanislav SedovNetwork Working Group P. Faltstrom 8*ae771770SStanislav SedovRequest for Comments: 3490 Cisco 9*ae771770SStanislav SedovCategory: Standards Track P. Hoffman 10*ae771770SStanislav Sedov IMC & VPNC 11*ae771770SStanislav Sedov A. Costello 12*ae771770SStanislav Sedov UC Berkeley 13*ae771770SStanislav Sedov March 2003 14*ae771770SStanislav Sedov 15*ae771770SStanislav Sedov 16*ae771770SStanislav Sedov Internationalizing Domain Names in Applications (IDNA) 17*ae771770SStanislav Sedov 18*ae771770SStanislav SedovStatus of this Memo 19*ae771770SStanislav Sedov 20*ae771770SStanislav Sedov This document specifies an Internet standards track protocol for the 21*ae771770SStanislav Sedov Internet community, and requests discussion and suggestions for 22*ae771770SStanislav Sedov improvements. Please refer to the current edition of the "Internet 23*ae771770SStanislav Sedov Official Protocol Standards" (STD 1) for the standardization state 24*ae771770SStanislav Sedov and status of this protocol. Distribution of this memo is unlimited. 25*ae771770SStanislav Sedov 26*ae771770SStanislav SedovCopyright Notice 27*ae771770SStanislav Sedov 28*ae771770SStanislav Sedov Copyright (C) The Internet Society (2003). All Rights Reserved. 29*ae771770SStanislav Sedov 30*ae771770SStanislav SedovAbstract 31*ae771770SStanislav Sedov 32*ae771770SStanislav Sedov Until now, there has been no standard method for domain names to use 33*ae771770SStanislav Sedov characters outside the ASCII repertoire. This document defines 34*ae771770SStanislav Sedov internationalized domain names (IDNs) and a mechanism called 35*ae771770SStanislav Sedov Internationalizing Domain Names in Applications (IDNA) for handling 36*ae771770SStanislav Sedov them in a standard fashion. IDNs use characters drawn from a large 37*ae771770SStanislav Sedov repertoire (Unicode), but IDNA allows the non-ASCII characters to be 38*ae771770SStanislav Sedov represented using only the ASCII characters already allowed in so- 39*ae771770SStanislav Sedov called host names today. This backward-compatible representation is 40*ae771770SStanislav Sedov required in existing protocols like DNS, so that IDNs can be 41*ae771770SStanislav Sedov introduced with no changes to the existing infrastructure. IDNA is 42*ae771770SStanislav Sedov only meant for processing domain names, not free text. 43*ae771770SStanislav Sedov 44*ae771770SStanislav SedovTable of Contents 45*ae771770SStanislav Sedov 46*ae771770SStanislav Sedov 1. Introduction.................................................. 2 47*ae771770SStanislav Sedov 1.1 Problem Statement......................................... 3 48*ae771770SStanislav Sedov 1.2 Limitations of IDNA....................................... 3 49*ae771770SStanislav Sedov 1.3 Brief overview for application developers................. 4 50*ae771770SStanislav Sedov 2. Terminology................................................... 5 51*ae771770SStanislav Sedov 3. Requirements and applicability................................ 7 52*ae771770SStanislav Sedov 3.1 Requirements.............................................. 7 53*ae771770SStanislav Sedov 3.2 Applicability............................................. 8 54*ae771770SStanislav Sedov 3.2.1. DNS resource records................................ 8 55*ae771770SStanislav Sedov 56*ae771770SStanislav Sedov 57*ae771770SStanislav Sedov 58*ae771770SStanislav SedovFaltstrom, et al. Standards Track [Page 1] 59*ae771770SStanislav Sedov 60*ae771770SStanislav SedovRFC 3490 IDNA March 2003 61*ae771770SStanislav Sedov 62*ae771770SStanislav Sedov 63*ae771770SStanislav Sedov 3.2.2. Non-domain-name data types stored in domain names... 9 64*ae771770SStanislav Sedov 4. Conversion operations......................................... 9 65*ae771770SStanislav Sedov 4.1 ToASCII................................................... 10 66*ae771770SStanislav Sedov 4.2 ToUnicode................................................. 11 67*ae771770SStanislav Sedov 5. ACE prefix.................................................... 12 68*ae771770SStanislav Sedov 6. Implications for typical applications using DNS............... 13 69*ae771770SStanislav Sedov 6.1 Entry and display in applications......................... 14 70*ae771770SStanislav Sedov 6.2 Applications and resolver libraries....................... 15 71*ae771770SStanislav Sedov 6.3 DNS servers............................................... 15 72*ae771770SStanislav Sedov 6.4 Avoiding exposing users to the raw ACE encoding........... 16 73*ae771770SStanislav Sedov 6.5 DNSSEC authentication of IDN domain names................ 16 74*ae771770SStanislav Sedov 7. Name server considerations.................................... 17 75*ae771770SStanislav Sedov 8. Root server considerations.................................... 17 76*ae771770SStanislav Sedov 9. References.................................................... 18 77*ae771770SStanislav Sedov 9.1 Normative References...................................... 18 78*ae771770SStanislav Sedov 9.2 Informative References.................................... 18 79*ae771770SStanislav Sedov 10. Security Considerations...................................... 19 80*ae771770SStanislav Sedov 11. IANA Considerations.......................................... 20 81*ae771770SStanislav Sedov 12. Authors' Addresses........................................... 21 82*ae771770SStanislav Sedov 13. Full Copyright Statement..................................... 22 83*ae771770SStanislav Sedov 84*ae771770SStanislav Sedov1. Introduction 85*ae771770SStanislav Sedov 86*ae771770SStanislav Sedov IDNA works by allowing applications to use certain ASCII name labels 87*ae771770SStanislav Sedov (beginning with a special prefix) to represent non-ASCII name labels. 88*ae771770SStanislav Sedov Lower-layer protocols need not be aware of this; therefore IDNA does 89*ae771770SStanislav Sedov not depend on changes to any infrastructure. In particular, IDNA 90*ae771770SStanislav Sedov does not depend on any changes to DNS servers, resolvers, or protocol 91*ae771770SStanislav Sedov elements, because the ASCII name service provided by the existing DNS 92*ae771770SStanislav Sedov is entirely sufficient for IDNA. 93*ae771770SStanislav Sedov 94*ae771770SStanislav Sedov This document does not require any applications to conform to IDNA, 95*ae771770SStanislav Sedov but applications can elect to use IDNA in order to support IDN while 96*ae771770SStanislav Sedov maintaining interoperability with existing infrastructure. If an 97*ae771770SStanislav Sedov application wants to use non-ASCII characters in domain names, IDNA 98*ae771770SStanislav Sedov is the only currently-defined option. Adding IDNA support to an 99*ae771770SStanislav Sedov existing application entails changes to the application only, and 100*ae771770SStanislav Sedov leaves room for flexibility in the user interface. 101*ae771770SStanislav Sedov 102*ae771770SStanislav Sedov A great deal of the discussion of IDN solutions has focused on 103*ae771770SStanislav Sedov transition issues and how IDN will work in a world where not all of 104*ae771770SStanislav Sedov the components have been updated. Proposals that were not chosen by 105*ae771770SStanislav Sedov the IDN Working Group would depend on user applications, resolvers, 106*ae771770SStanislav Sedov and DNS servers being updated in order for a user to use an 107*ae771770SStanislav Sedov internationalized domain name. Rather than rely on widespread 108*ae771770SStanislav Sedov updating of all components, IDNA depends on updates to user 109*ae771770SStanislav Sedov applications only; no changes are needed to the DNS protocol or any 110*ae771770SStanislav Sedov DNS servers or the resolvers on user's computers. 111*ae771770SStanislav Sedov 112*ae771770SStanislav Sedov 113*ae771770SStanislav Sedov 114*ae771770SStanislav SedovFaltstrom, et al. Standards Track [Page 2] 115*ae771770SStanislav Sedov 116*ae771770SStanislav SedovRFC 3490 IDNA March 2003 117*ae771770SStanislav Sedov 118*ae771770SStanislav Sedov 119*ae771770SStanislav Sedov1.1 Problem Statement 120*ae771770SStanislav Sedov 121*ae771770SStanislav Sedov The IDNA specification solves the problem of extending the repertoire 122*ae771770SStanislav Sedov of characters that can be used in domain names to include the Unicode 123*ae771770SStanislav Sedov repertoire (with some restrictions). 124*ae771770SStanislav Sedov 125*ae771770SStanislav Sedov IDNA does not extend the service offered by DNS to the applications. 126*ae771770SStanislav Sedov Instead, the applications (and, by implication, the users) continue 127*ae771770SStanislav Sedov to see an exact-match lookup service. Either there is a single 128*ae771770SStanislav Sedov exactly-matching name or there is no match. This model has served 129*ae771770SStanislav Sedov the existing applications well, but it requires, with or without 130*ae771770SStanislav Sedov internationalized domain names, that users know the exact spelling of 131*ae771770SStanislav Sedov the domain names that the users type into applications such as web 132*ae771770SStanislav Sedov browsers and mail user agents. The introduction of the larger 133*ae771770SStanislav Sedov repertoire of characters potentially makes the set of misspellings 134*ae771770SStanislav Sedov larger, especially given that in some cases the same appearance, for 135*ae771770SStanislav Sedov example on a business card, might visually match several Unicode code 136*ae771770SStanislav Sedov points or several sequences of code points. 137*ae771770SStanislav Sedov 138*ae771770SStanislav Sedov IDNA allows the graceful introduction of IDNs not only by avoiding 139*ae771770SStanislav Sedov upgrades to existing infrastructure (such as DNS servers and mail 140*ae771770SStanislav Sedov transport agents), but also by allowing some rudimentary use of IDNs 141*ae771770SStanislav Sedov in applications by using the ASCII representation of the non-ASCII 142*ae771770SStanislav Sedov name labels. While such names are very user-unfriendly to read and 143*ae771770SStanislav Sedov type, and hence are not suitable for user input, they allow (for 144*ae771770SStanislav Sedov instance) replying to email and clicking on URLs even though the 145*ae771770SStanislav Sedov domain name displayed is incomprehensible to the user. In order to 146*ae771770SStanislav Sedov allow user-friendly input and output of the IDNs, the applications 147*ae771770SStanislav Sedov need to be modified to conform to this specification. 148*ae771770SStanislav Sedov 149*ae771770SStanislav Sedov IDNA uses the Unicode character repertoire, which avoids the 150*ae771770SStanislav Sedov significant delays that would be inherent in waiting for a different 151*ae771770SStanislav Sedov and specific character set be defined for IDN purposes by some other 152*ae771770SStanislav Sedov standards developing organization. 153*ae771770SStanislav Sedov 154*ae771770SStanislav Sedov1.2 Limitations of IDNA 155*ae771770SStanislav Sedov 156*ae771770SStanislav Sedov The IDNA protocol does not solve all linguistic issues with users 157*ae771770SStanislav Sedov inputting names in different scripts. Many important language-based 158*ae771770SStanislav Sedov and script-based mappings are not covered in IDNA and need to be 159*ae771770SStanislav Sedov handled outside the protocol. For example, names that are entered in 160*ae771770SStanislav Sedov a mix of traditional and simplified Chinese characters will not be 161*ae771770SStanislav Sedov mapped to a single canonical name. Another example is Scandinavian 162*ae771770SStanislav Sedov names that are entered with U+00F6 (LATIN SMALL LETTER O WITH 163*ae771770SStanislav Sedov DIAERESIS) will not be mapped to U+00F8 (LATIN SMALL LETTER O WITH 164*ae771770SStanislav Sedov STROKE). 165*ae771770SStanislav Sedov 166*ae771770SStanislav Sedov 167*ae771770SStanislav Sedov 168*ae771770SStanislav Sedov 169*ae771770SStanislav Sedov 170*ae771770SStanislav SedovFaltstrom, et al. Standards Track [Page 3] 171*ae771770SStanislav Sedov 172*ae771770SStanislav SedovRFC 3490 IDNA March 2003 173*ae771770SStanislav Sedov 174*ae771770SStanislav Sedov 175*ae771770SStanislav Sedov An example of an important issue that is not considered in detail in 176*ae771770SStanislav Sedov IDNA is how to provide a high probability that a user who is entering 177*ae771770SStanislav Sedov a domain name based on visual information (such as from a business 178*ae771770SStanislav Sedov card or billboard) or aural information (such as from a telephone or 179*ae771770SStanislav Sedov radio) would correctly enter the IDN. Similar issues exist for ASCII 180*ae771770SStanislav Sedov domain names, for example the possible visual confusion between the 181*ae771770SStanislav Sedov letter 'O' and the digit zero, but the introduction of the larger 182*ae771770SStanislav Sedov repertoire of characters creates more opportunities of similar 183*ae771770SStanislav Sedov looking and similar sounding names. Note that this is a complex 184*ae771770SStanislav Sedov issue relating to languages, input methods on computers, and so on. 185*ae771770SStanislav Sedov Furthermore, the kind of matching and searching necessary for a high 186*ae771770SStanislav Sedov probability of success would not fit the role of the DNS and its 187*ae771770SStanislav Sedov exact matching function. 188*ae771770SStanislav Sedov 189*ae771770SStanislav Sedov1.3 Brief overview for application developers 190*ae771770SStanislav Sedov 191*ae771770SStanislav Sedov Applications can use IDNA to support internationalized domain names 192*ae771770SStanislav Sedov anywhere that ASCII domain names are already supported, including DNS 193*ae771770SStanislav Sedov master files and resolver interfaces. (Applications can also define 194*ae771770SStanislav Sedov protocols and interfaces that support IDNs directly using non-ASCII 195*ae771770SStanislav Sedov representations. IDNA does not prescribe any particular 196*ae771770SStanislav Sedov representation for new protocols, but it still defines which names 197*ae771770SStanislav Sedov are valid and how they are compared.) 198*ae771770SStanislav Sedov 199*ae771770SStanislav Sedov The IDNA protocol is contained completely within applications. It is 200*ae771770SStanislav Sedov not a client-server or peer-to-peer protocol: everything is done 201*ae771770SStanislav Sedov inside the application itself. When used with a DNS resolver 202*ae771770SStanislav Sedov library, IDNA is inserted as a "shim" between the application and the 203*ae771770SStanislav Sedov resolver library. When used for writing names into a DNS zone, IDNA 204*ae771770SStanislav Sedov is used just before the name is committed to the zone. 205*ae771770SStanislav Sedov 206*ae771770SStanislav Sedov There are two operations described in section 4 of this document: 207*ae771770SStanislav Sedov 208*ae771770SStanislav Sedov - The ToASCII operation is used before sending an IDN to something 209*ae771770SStanislav Sedov that expects ASCII names (such as a resolver) or writing an IDN 210*ae771770SStanislav Sedov into a place that expects ASCII names (such as a DNS master file). 211*ae771770SStanislav Sedov 212*ae771770SStanislav Sedov - The ToUnicode operation is used when displaying names to users, 213*ae771770SStanislav Sedov for example names obtained from a DNS zone. 214*ae771770SStanislav Sedov 215*ae771770SStanislav Sedov It is important to note that the ToASCII operation can fail. If it 216*ae771770SStanislav Sedov fails when processing a domain name, that domain name cannot be used 217*ae771770SStanislav Sedov as an internationalized domain name and the application has to have 218*ae771770SStanislav Sedov some method of dealing with this failure. 219*ae771770SStanislav Sedov 220*ae771770SStanislav Sedov IDNA requires that implementations process input strings with 221*ae771770SStanislav Sedov Nameprep [NAMEPREP], which is a profile of Stringprep [STRINGPREP], 222*ae771770SStanislav Sedov and then with Punycode [PUNYCODE]. Implementations of IDNA MUST 223*ae771770SStanislav Sedov 224*ae771770SStanislav Sedov 225*ae771770SStanislav Sedov 226*ae771770SStanislav SedovFaltstrom, et al. Standards Track [Page 4] 227*ae771770SStanislav Sedov 228*ae771770SStanislav SedovRFC 3490 IDNA March 2003 229*ae771770SStanislav Sedov 230*ae771770SStanislav Sedov 231*ae771770SStanislav Sedov fully implement Nameprep and Punycode; neither Nameprep nor Punycode 232*ae771770SStanislav Sedov are optional. 233*ae771770SStanislav Sedov 234*ae771770SStanislav Sedov2. Terminology 235*ae771770SStanislav Sedov 236*ae771770SStanislav Sedov The key words "MUST", "SHALL", "REQUIRED", "SHOULD", "RECOMMENDED", 237*ae771770SStanislav Sedov and "MAY" in this document are to be interpreted as described in BCP 238*ae771770SStanislav Sedov 14, RFC 2119 [RFC2119]. 239*ae771770SStanislav Sedov 240*ae771770SStanislav Sedov A code point is an integer value associated with a character in a 241*ae771770SStanislav Sedov coded character set. 242*ae771770SStanislav Sedov 243*ae771770SStanislav Sedov Unicode [UNICODE] is a coded character set containing tens of 244*ae771770SStanislav Sedov thousands of characters. A single Unicode code point is denoted by 245*ae771770SStanislav Sedov "U+" followed by four to six hexadecimal digits, while a range of 246*ae771770SStanislav Sedov Unicode code points is denoted by two hexadecimal numbers separated 247*ae771770SStanislav Sedov by "..", with no prefixes. 248*ae771770SStanislav Sedov 249*ae771770SStanislav Sedov ASCII means US-ASCII [USASCII], a coded character set containing 128 250*ae771770SStanislav Sedov characters associated with code points in the range 0..7F. Unicode 251*ae771770SStanislav Sedov is an extension of ASCII: it includes all the ASCII characters and 252*ae771770SStanislav Sedov associates them with the same code points. 253*ae771770SStanislav Sedov 254*ae771770SStanislav Sedov The term "LDH code points" is defined in this document to mean the 255*ae771770SStanislav Sedov code points associated with ASCII letters, digits, and the hyphen- 256*ae771770SStanislav Sedov minus; that is, U+002D, 30..39, 41..5A, and 61..7A. "LDH" is an 257*ae771770SStanislav Sedov abbreviation for "letters, digits, hyphen". 258*ae771770SStanislav Sedov 259*ae771770SStanislav Sedov [STD13] talks about "domain names" and "host names", but many people 260*ae771770SStanislav Sedov use the terms interchangeably. Further, because [STD13] was not 261*ae771770SStanislav Sedov terribly clear, many people who are sure they know the exact 262*ae771770SStanislav Sedov definitions of each of these terms disagree on the definitions. In 263*ae771770SStanislav Sedov this document the term "domain name" is used in general. This 264*ae771770SStanislav Sedov document explicitly cites [STD3] whenever referring to the host name 265*ae771770SStanislav Sedov syntax restrictions defined therein. 266*ae771770SStanislav Sedov 267*ae771770SStanislav Sedov A label is an individual part of a domain name. Labels are usually 268*ae771770SStanislav Sedov shown separated by dots; for example, the domain name 269*ae771770SStanislav Sedov "www.example.com" is composed of three labels: "www", "example", and 270*ae771770SStanislav Sedov "com". (The zero-length root label described in [STD13], which can 271*ae771770SStanislav Sedov be explicit as in "www.example.com." or implicit as in 272*ae771770SStanislav Sedov "www.example.com", is not considered a label in this specification.) 273*ae771770SStanislav Sedov IDNA extends the set of usable characters in labels that are text. 274*ae771770SStanislav Sedov For the rest of this document, the term "label" is shorthand for 275*ae771770SStanislav Sedov "text label", and "every label" means "every text label". 276*ae771770SStanislav Sedov 277*ae771770SStanislav Sedov 278*ae771770SStanislav Sedov 279*ae771770SStanislav Sedov 280*ae771770SStanislav Sedov 281*ae771770SStanislav Sedov 282*ae771770SStanislav SedovFaltstrom, et al. Standards Track [Page 5] 283*ae771770SStanislav Sedov 284*ae771770SStanislav SedovRFC 3490 IDNA March 2003 285*ae771770SStanislav Sedov 286*ae771770SStanislav Sedov 287*ae771770SStanislav Sedov An "internationalized label" is a label to which the ToASCII 288*ae771770SStanislav Sedov operation (see section 4) can be applied without failing (with the 289*ae771770SStanislav Sedov UseSTD3ASCIIRules flag unset). This implies that every ASCII label 290*ae771770SStanislav Sedov that satisfies the [STD13] length restriction is an internationalized 291*ae771770SStanislav Sedov label. Therefore the term "internationalized label" is a 292*ae771770SStanislav Sedov generalization, embracing both old ASCII labels and new non-ASCII 293*ae771770SStanislav Sedov labels. Although most Unicode characters can appear in 294*ae771770SStanislav Sedov internationalized labels, ToASCII will fail for some input strings, 295*ae771770SStanislav Sedov and such strings are not valid internationalized labels. 296*ae771770SStanislav Sedov 297*ae771770SStanislav Sedov An "internationalized domain name" (IDN) is a domain name in which 298*ae771770SStanislav Sedov every label is an internationalized label. This implies that every 299*ae771770SStanislav Sedov ASCII domain name is an IDN (which implies that it is possible for a 300*ae771770SStanislav Sedov name to be an IDN without it containing any non-ASCII characters). 301*ae771770SStanislav Sedov This document does not attempt to define an "internationalized host 302*ae771770SStanislav Sedov name". Just as has been the case with ASCII names, some DNS zone 303*ae771770SStanislav Sedov administrators may impose restrictions, beyond those imposed by DNS 304*ae771770SStanislav Sedov or IDNA, on the characters or strings that may be registered as 305*ae771770SStanislav Sedov labels in their zones. Such restrictions have no impact on the 306*ae771770SStanislav Sedov syntax or semantics of DNS protocol messages; a query for a name that 307*ae771770SStanislav Sedov matches no records will yield the same response regardless of the 308*ae771770SStanislav Sedov reason why it is not in the zone. Clients issuing queries or 309*ae771770SStanislav Sedov interpreting responses cannot be assumed to have any knowledge of 310*ae771770SStanislav Sedov zone-specific restrictions or conventions. 311*ae771770SStanislav Sedov 312*ae771770SStanislav Sedov In IDNA, equivalence of labels is defined in terms of the ToASCII 313*ae771770SStanislav Sedov operation, which constructs an ASCII form for a given label, whether 314*ae771770SStanislav Sedov or not the label was already an ASCII label. Labels are defined to 315*ae771770SStanislav Sedov be equivalent if and only if their ASCII forms produced by ToASCII 316*ae771770SStanislav Sedov match using a case-insensitive ASCII comparison. ASCII labels 317*ae771770SStanislav Sedov already have a notion of equivalence: upper case and lower case are 318*ae771770SStanislav Sedov considered equivalent. The IDNA notion of equivalence is an 319*ae771770SStanislav Sedov extension of that older notion. Equivalent labels in IDNA are 320*ae771770SStanislav Sedov treated as alternate forms of the same label, just as "foo" and "Foo" 321*ae771770SStanislav Sedov are treated as alternate forms of the same label. 322*ae771770SStanislav Sedov 323*ae771770SStanislav Sedov To allow internationalized labels to be handled by existing 324*ae771770SStanislav Sedov applications, IDNA uses an "ACE label" (ACE stands for ASCII 325*ae771770SStanislav Sedov Compatible Encoding). An ACE label is an internationalized label 326*ae771770SStanislav Sedov that can be rendered in ASCII and is equivalent to an 327*ae771770SStanislav Sedov internationalized label that cannot be rendered in ASCII. Given any 328*ae771770SStanislav Sedov internationalized label that cannot be rendered in ASCII, the ToASCII 329*ae771770SStanislav Sedov operation will convert it to an equivalent ACE label (whereas an 330*ae771770SStanislav Sedov ASCII label will be left unaltered by ToASCII). ACE labels are 331*ae771770SStanislav Sedov unsuitable for display to users. The ToUnicode operation will 332*ae771770SStanislav Sedov convert any label to an equivalent non-ACE label. In fact, an ACE 333*ae771770SStanislav Sedov label is formally defined to be any label that the ToUnicode 334*ae771770SStanislav Sedov operation would alter (whereas non-ACE labels are left unaltered by 335*ae771770SStanislav Sedov 336*ae771770SStanislav Sedov 337*ae771770SStanislav Sedov 338*ae771770SStanislav SedovFaltstrom, et al. Standards Track [Page 6] 339*ae771770SStanislav Sedov 340*ae771770SStanislav SedovRFC 3490 IDNA March 2003 341*ae771770SStanislav Sedov 342*ae771770SStanislav Sedov 343*ae771770SStanislav Sedov ToUnicode). Every ACE label begins with the ACE prefix specified in 344*ae771770SStanislav Sedov section 5. The ToASCII and ToUnicode operations are specified in 345*ae771770SStanislav Sedov section 4. 346*ae771770SStanislav Sedov 347*ae771770SStanislav Sedov The "ACE prefix" is defined in this document to be a string of ASCII 348*ae771770SStanislav Sedov characters that appears at the beginning of every ACE label. It is 349*ae771770SStanislav Sedov specified in section 5. 350*ae771770SStanislav Sedov 351*ae771770SStanislav Sedov A "domain name slot" is defined in this document to be a protocol 352*ae771770SStanislav Sedov element or a function argument or a return value (and so on) 353*ae771770SStanislav Sedov explicitly designated for carrying a domain name. Examples of domain 354*ae771770SStanislav Sedov name slots include: the QNAME field of a DNS query; the name argument 355*ae771770SStanislav Sedov of the gethostbyname() library function; the part of an email address 356*ae771770SStanislav Sedov following the at-sign (@) in the From: field of an email message 357*ae771770SStanislav Sedov header; and the host portion of the URI in the src attribute of an 358*ae771770SStanislav Sedov HTML <IMG> tag. General text that just happens to contain a domain 359*ae771770SStanislav Sedov name is not a domain name slot; for example, a domain name appearing 360*ae771770SStanislav Sedov in the plain text body of an email message is not occupying a domain 361*ae771770SStanislav Sedov name slot. 362*ae771770SStanislav Sedov 363*ae771770SStanislav Sedov An "IDN-aware domain name slot" is defined in this document to be a 364*ae771770SStanislav Sedov domain name slot explicitly designated for carrying an 365*ae771770SStanislav Sedov internationalized domain name as defined in this document. The 366*ae771770SStanislav Sedov designation may be static (for example, in the specification of the 367*ae771770SStanislav Sedov protocol or interface) or dynamic (for example, as a result of 368*ae771770SStanislav Sedov negotiation in an interactive session). 369*ae771770SStanislav Sedov 370*ae771770SStanislav Sedov An "IDN-unaware domain name slot" is defined in this document to be 371*ae771770SStanislav Sedov any domain name slot that is not an IDN-aware domain name slot. 372*ae771770SStanislav Sedov Obviously, this includes any domain name slot whose specification 373*ae771770SStanislav Sedov predates IDNA. 374*ae771770SStanislav Sedov 375*ae771770SStanislav Sedov3. Requirements and applicability 376*ae771770SStanislav Sedov 377*ae771770SStanislav Sedov3.1 Requirements 378*ae771770SStanislav Sedov 379*ae771770SStanislav Sedov IDNA conformance means adherence to the following four requirements: 380*ae771770SStanislav Sedov 381*ae771770SStanislav Sedov 1) Whenever dots are used as label separators, the following 382*ae771770SStanislav Sedov characters MUST be recognized as dots: U+002E (full stop), U+3002 383*ae771770SStanislav Sedov (ideographic full stop), U+FF0E (fullwidth full stop), U+FF61 384*ae771770SStanislav Sedov (halfwidth ideographic full stop). 385*ae771770SStanislav Sedov 386*ae771770SStanislav Sedov 2) Whenever a domain name is put into an IDN-unaware domain name slot 387*ae771770SStanislav Sedov (see section 2), it MUST contain only ASCII characters. Given an 388*ae771770SStanislav Sedov internationalized domain name (IDN), an equivalent domain name 389*ae771770SStanislav Sedov satisfying this requirement can be obtained by applying the 390*ae771770SStanislav Sedov 391*ae771770SStanislav Sedov 392*ae771770SStanislav Sedov 393*ae771770SStanislav Sedov 394*ae771770SStanislav SedovFaltstrom, et al. Standards Track [Page 7] 395*ae771770SStanislav Sedov 396*ae771770SStanislav SedovRFC 3490 IDNA March 2003 397*ae771770SStanislav Sedov 398*ae771770SStanislav Sedov 399*ae771770SStanislav Sedov ToASCII operation (see section 4) to each label and, if dots are 400*ae771770SStanislav Sedov used as label separators, changing all the label separators to 401*ae771770SStanislav Sedov U+002E. 402*ae771770SStanislav Sedov 403*ae771770SStanislav Sedov 3) ACE labels obtained from domain name slots SHOULD be hidden from 404*ae771770SStanislav Sedov users when it is known that the environment can handle the non-ACE 405*ae771770SStanislav Sedov form, except when the ACE form is explicitly requested. When it 406*ae771770SStanislav Sedov is not known whether or not the environment can handle the non-ACE 407*ae771770SStanislav Sedov form, the application MAY use the non-ACE form (which might fail, 408*ae771770SStanislav Sedov such as by not being displayed properly), or it MAY use the ACE 409*ae771770SStanislav Sedov form (which will look unintelligle to the user). Given an 410*ae771770SStanislav Sedov internationalized domain name, an equivalent domain name 411*ae771770SStanislav Sedov containing no ACE labels can be obtained by applying the ToUnicode 412*ae771770SStanislav Sedov operation (see section 4) to each label. When requirements 2 and 413*ae771770SStanislav Sedov 3 both apply, requirement 2 takes precedence. 414*ae771770SStanislav Sedov 415*ae771770SStanislav Sedov 4) Whenever two labels are compared, they MUST be considered to match 416*ae771770SStanislav Sedov if and only if they are equivalent, that is, their ASCII forms 417*ae771770SStanislav Sedov (obtained by applying ToASCII) match using a case-insensitive 418*ae771770SStanislav Sedov ASCII comparison. Whenever two names are compared, they MUST be 419*ae771770SStanislav Sedov considered to match if and only if their corresponding labels 420*ae771770SStanislav Sedov match, regardless of whether the names use the same forms of label 421*ae771770SStanislav Sedov separators. 422*ae771770SStanislav Sedov 423*ae771770SStanislav Sedov3.2 Applicability 424*ae771770SStanislav Sedov 425*ae771770SStanislav Sedov IDNA is applicable to all domain names in all domain name slots 426*ae771770SStanislav Sedov except where it is explicitly excluded. 427*ae771770SStanislav Sedov 428*ae771770SStanislav Sedov This implies that IDNA is applicable to many protocols that predate 429*ae771770SStanislav Sedov IDNA. Note that IDNs occupying domain name slots in those protocols 430*ae771770SStanislav Sedov MUST be in ASCII form (see section 3.1, requirement 2). 431*ae771770SStanislav Sedov 432*ae771770SStanislav Sedov3.2.1. DNS resource records 433*ae771770SStanislav Sedov 434*ae771770SStanislav Sedov IDNA does not apply to domain names in the NAME and RDATA fields of 435*ae771770SStanislav Sedov DNS resource records whose CLASS is not IN. This exclusion applies 436*ae771770SStanislav Sedov to every non-IN class, present and future, except where future 437*ae771770SStanislav Sedov standards override this exclusion by explicitly inviting the use of 438*ae771770SStanislav Sedov IDNA. 439*ae771770SStanislav Sedov 440*ae771770SStanislav Sedov There are currently no other exclusions on the applicability of IDNA 441*ae771770SStanislav Sedov to DNS resource records; it depends entirely on the CLASS, and not on 442*ae771770SStanislav Sedov the TYPE. This will remain true, even as new types are defined, 443*ae771770SStanislav Sedov unless there is a compelling reason for a new type to complicate 444*ae771770SStanislav Sedov matters by imposing type-specific rules. 445*ae771770SStanislav Sedov 446*ae771770SStanislav Sedov 447*ae771770SStanislav Sedov 448*ae771770SStanislav Sedov 449*ae771770SStanislav Sedov 450*ae771770SStanislav SedovFaltstrom, et al. Standards Track [Page 8] 451*ae771770SStanislav Sedov 452*ae771770SStanislav SedovRFC 3490 IDNA March 2003 453*ae771770SStanislav Sedov 454*ae771770SStanislav Sedov 455*ae771770SStanislav Sedov3.2.2. Non-domain-name data types stored in domain names 456*ae771770SStanislav Sedov 457*ae771770SStanislav Sedov Although IDNA enables the representation of non-ASCII characters in 458*ae771770SStanislav Sedov domain names, that does not imply that IDNA enables the 459*ae771770SStanislav Sedov representation of non-ASCII characters in other data types that are 460*ae771770SStanislav Sedov stored in domain names. For example, an email address local part is 461*ae771770SStanislav Sedov sometimes stored in a domain label (hostmaster@example.com would be 462*ae771770SStanislav Sedov represented as hostmaster.example.com in the RDATA field of an SOA 463*ae771770SStanislav Sedov record). IDNA does not update the existing email standards, which 464*ae771770SStanislav Sedov allow only ASCII characters in local parts. Therefore, unless the 465*ae771770SStanislav Sedov email standards are revised to invite the use of IDNA for local 466*ae771770SStanislav Sedov parts, a domain label that holds the local part of an email address 467*ae771770SStanislav Sedov SHOULD NOT begin with the ACE prefix, and even if it does, it is to 468*ae771770SStanislav Sedov be interpreted literally as a local part that happens to begin with 469*ae771770SStanislav Sedov the ACE prefix. 470*ae771770SStanislav Sedov 471*ae771770SStanislav Sedov4. Conversion operations 472*ae771770SStanislav Sedov 473*ae771770SStanislav Sedov An application converts a domain name put into an IDN-unaware slot or 474*ae771770SStanislav Sedov displayed to a user. This section specifies the steps to perform in 475*ae771770SStanislav Sedov the conversion, and the ToASCII and ToUnicode operations. 476*ae771770SStanislav Sedov 477*ae771770SStanislav Sedov The input to ToASCII or ToUnicode is a single label that is a 478*ae771770SStanislav Sedov sequence of Unicode code points (remember that all ASCII code points 479*ae771770SStanislav Sedov are also Unicode code points). If a domain name is represented using 480*ae771770SStanislav Sedov a character set other than Unicode or US-ASCII, it will first need to 481*ae771770SStanislav Sedov be transcoded to Unicode. 482*ae771770SStanislav Sedov 483*ae771770SStanislav Sedov Starting from a whole domain name, the steps that an application 484*ae771770SStanislav Sedov takes to do the conversions are: 485*ae771770SStanislav Sedov 486*ae771770SStanislav Sedov 1) Decide whether the domain name is a "stored string" or a "query 487*ae771770SStanislav Sedov string" as described in [STRINGPREP]. If this conversion follows 488*ae771770SStanislav Sedov the "queries" rule from [STRINGPREP], set the flag called 489*ae771770SStanislav Sedov "AllowUnassigned". 490*ae771770SStanislav Sedov 491*ae771770SStanislav Sedov 2) Split the domain name into individual labels as described in 492*ae771770SStanislav Sedov section 3.1. The labels do not include the separator. 493*ae771770SStanislav Sedov 494*ae771770SStanislav Sedov 3) For each label, decide whether or not to enforce the restrictions 495*ae771770SStanislav Sedov on ASCII characters in host names [STD3]. (Applications already 496*ae771770SStanislav Sedov faced this choice before the introduction of IDNA, and can 497*ae771770SStanislav Sedov continue to make the decision the same way they always have; IDNA 498*ae771770SStanislav Sedov makes no new recommendations regarding this choice.) If the 499*ae771770SStanislav Sedov restrictions are to be enforced, set the flag called 500*ae771770SStanislav Sedov "UseSTD3ASCIIRules" for that label. 501*ae771770SStanislav Sedov 502*ae771770SStanislav Sedov 503*ae771770SStanislav Sedov 504*ae771770SStanislav Sedov 505*ae771770SStanislav Sedov 506*ae771770SStanislav SedovFaltstrom, et al. Standards Track [Page 9] 507*ae771770SStanislav Sedov 508*ae771770SStanislav SedovRFC 3490 IDNA March 2003 509*ae771770SStanislav Sedov 510*ae771770SStanislav Sedov 511*ae771770SStanislav Sedov 4) Process each label with either the ToASCII or the ToUnicode 512*ae771770SStanislav Sedov operation as appropriate. Typically, you use the ToASCII 513*ae771770SStanislav Sedov operation if you are about to put the name into an IDN-unaware 514*ae771770SStanislav Sedov slot, and you use the ToUnicode operation if you are displaying 515*ae771770SStanislav Sedov the name to a user; section 3.1 gives greater detail on the 516*ae771770SStanislav Sedov applicable requirements. 517*ae771770SStanislav Sedov 518*ae771770SStanislav Sedov 5) If ToASCII was applied in step 4 and dots are used as label 519*ae771770SStanislav Sedov separators, change all the label separators to U+002E (full stop). 520*ae771770SStanislav Sedov 521*ae771770SStanislav Sedov The following two subsections define the ToASCII and ToUnicode 522*ae771770SStanislav Sedov operations that are used in step 4. 523*ae771770SStanislav Sedov 524*ae771770SStanislav Sedov This description of the protocol uses specific procedure names, names 525*ae771770SStanislav Sedov of flags, and so on, in order to facilitate the specification of the 526*ae771770SStanislav Sedov protocol. These names, as well as the actual steps of the 527*ae771770SStanislav Sedov procedures, are not required of an implementation. In fact, any 528*ae771770SStanislav Sedov implementation which has the same external behavior as specified in 529*ae771770SStanislav Sedov this document conforms to this specification. 530*ae771770SStanislav Sedov 531*ae771770SStanislav Sedov4.1 ToASCII 532*ae771770SStanislav Sedov 533*ae771770SStanislav Sedov The ToASCII operation takes a sequence of Unicode code points that 534*ae771770SStanislav Sedov make up one label and transforms it into a sequence of code points in 535*ae771770SStanislav Sedov the ASCII range (0..7F). If ToASCII succeeds, the original sequence 536*ae771770SStanislav Sedov and the resulting sequence are equivalent labels. 537*ae771770SStanislav Sedov 538*ae771770SStanislav Sedov It is important to note that the ToASCII operation can fail. ToASCII 539*ae771770SStanislav Sedov fails if any step of it fails. If any step of the ToASCII operation 540*ae771770SStanislav Sedov fails on any label in a domain name, that domain name MUST NOT be 541*ae771770SStanislav Sedov used as an internationalized domain name. The method for dealing 542*ae771770SStanislav Sedov with this failure is application-specific. 543*ae771770SStanislav Sedov 544*ae771770SStanislav Sedov The inputs to ToASCII are a sequence of code points, the 545*ae771770SStanislav Sedov AllowUnassigned flag, and the UseSTD3ASCIIRules flag. The output of 546*ae771770SStanislav Sedov ToASCII is either a sequence of ASCII code points or a failure 547*ae771770SStanislav Sedov condition. 548*ae771770SStanislav Sedov 549*ae771770SStanislav Sedov ToASCII never alters a sequence of code points that are all in the 550*ae771770SStanislav Sedov ASCII range to begin with (although it could fail). Applying the 551*ae771770SStanislav Sedov ToASCII operation multiple times has exactly the same effect as 552*ae771770SStanislav Sedov applying it just once. 553*ae771770SStanislav Sedov 554*ae771770SStanislav Sedov ToASCII consists of the following steps: 555*ae771770SStanislav Sedov 556*ae771770SStanislav Sedov 1. If the sequence contains any code points outside the ASCII range 557*ae771770SStanislav Sedov (0..7F) then proceed to step 2, otherwise skip to step 3. 558*ae771770SStanislav Sedov 559*ae771770SStanislav Sedov 560*ae771770SStanislav Sedov 561*ae771770SStanislav Sedov 562*ae771770SStanislav SedovFaltstrom, et al. Standards Track [Page 10] 563*ae771770SStanislav Sedov 564*ae771770SStanislav SedovRFC 3490 IDNA March 2003 565*ae771770SStanislav Sedov 566*ae771770SStanislav Sedov 567*ae771770SStanislav Sedov 2. Perform the steps specified in [NAMEPREP] and fail if there is an 568*ae771770SStanislav Sedov error. The AllowUnassigned flag is used in [NAMEPREP]. 569*ae771770SStanislav Sedov 570*ae771770SStanislav Sedov 3. If the UseSTD3ASCIIRules flag is set, then perform these checks: 571*ae771770SStanislav Sedov 572*ae771770SStanislav Sedov (a) Verify the absence of non-LDH ASCII code points; that is, the 573*ae771770SStanislav Sedov absence of 0..2C, 2E..2F, 3A..40, 5B..60, and 7B..7F. 574*ae771770SStanislav Sedov 575*ae771770SStanislav Sedov (b) Verify the absence of leading and trailing hyphen-minus; that 576*ae771770SStanislav Sedov is, the absence of U+002D at the beginning and end of the 577*ae771770SStanislav Sedov sequence. 578*ae771770SStanislav Sedov 579*ae771770SStanislav Sedov 4. If the sequence contains any code points outside the ASCII range 580*ae771770SStanislav Sedov (0..7F) then proceed to step 5, otherwise skip to step 8. 581*ae771770SStanislav Sedov 582*ae771770SStanislav Sedov 5. Verify that the sequence does NOT begin with the ACE prefix. 583*ae771770SStanislav Sedov 584*ae771770SStanislav Sedov 6. Encode the sequence using the encoding algorithm in [PUNYCODE] and 585*ae771770SStanislav Sedov fail if there is an error. 586*ae771770SStanislav Sedov 587*ae771770SStanislav Sedov 7. Prepend the ACE prefix. 588*ae771770SStanislav Sedov 589*ae771770SStanislav Sedov 8. Verify that the number of code points is in the range 1 to 63 590*ae771770SStanislav Sedov inclusive. 591*ae771770SStanislav Sedov 592*ae771770SStanislav Sedov4.2 ToUnicode 593*ae771770SStanislav Sedov 594*ae771770SStanislav Sedov The ToUnicode operation takes a sequence of Unicode code points that 595*ae771770SStanislav Sedov make up one label and returns a sequence of Unicode code points. If 596*ae771770SStanislav Sedov the input sequence is a label in ACE form, then the result is an 597*ae771770SStanislav Sedov equivalent internationalized label that is not in ACE form, otherwise 598*ae771770SStanislav Sedov the original sequence is returned unaltered. 599*ae771770SStanislav Sedov 600*ae771770SStanislav Sedov ToUnicode never fails. If any step fails, then the original input 601*ae771770SStanislav Sedov sequence is returned immediately in that step. 602*ae771770SStanislav Sedov 603*ae771770SStanislav Sedov The ToUnicode output never contains more code points than its input. 604*ae771770SStanislav Sedov Note that the number of octets needed to represent a sequence of code 605*ae771770SStanislav Sedov points depends on the particular character encoding used. 606*ae771770SStanislav Sedov 607*ae771770SStanislav Sedov The inputs to ToUnicode are a sequence of code points, the 608*ae771770SStanislav Sedov AllowUnassigned flag, and the UseSTD3ASCIIRules flag. The output of 609*ae771770SStanislav Sedov ToUnicode is always a sequence of Unicode code points. 610*ae771770SStanislav Sedov 611*ae771770SStanislav Sedov 1. If all code points in the sequence are in the ASCII range (0..7F) 612*ae771770SStanislav Sedov then skip to step 3. 613*ae771770SStanislav Sedov 614*ae771770SStanislav Sedov 615*ae771770SStanislav Sedov 616*ae771770SStanislav Sedov 617*ae771770SStanislav Sedov 618*ae771770SStanislav SedovFaltstrom, et al. Standards Track [Page 11] 619*ae771770SStanislav Sedov 620*ae771770SStanislav SedovRFC 3490 IDNA March 2003 621*ae771770SStanislav Sedov 622*ae771770SStanislav Sedov 623*ae771770SStanislav Sedov 2. Perform the steps specified in [NAMEPREP] and fail if there is an 624*ae771770SStanislav Sedov error. (If step 3 of ToASCII is also performed here, it will not 625*ae771770SStanislav Sedov affect the overall behavior of ToUnicode, but it is not 626*ae771770SStanislav Sedov necessary.) The AllowUnassigned flag is used in [NAMEPREP]. 627*ae771770SStanislav Sedov 628*ae771770SStanislav Sedov 3. Verify that the sequence begins with the ACE prefix, and save a 629*ae771770SStanislav Sedov copy of the sequence. 630*ae771770SStanislav Sedov 631*ae771770SStanislav Sedov 4. Remove the ACE prefix. 632*ae771770SStanislav Sedov 633*ae771770SStanislav Sedov 5. Decode the sequence using the decoding algorithm in [PUNYCODE] and 634*ae771770SStanislav Sedov fail if there is an error. Save a copy of the result of this 635*ae771770SStanislav Sedov step. 636*ae771770SStanislav Sedov 637*ae771770SStanislav Sedov 6. Apply ToASCII. 638*ae771770SStanislav Sedov 639*ae771770SStanislav Sedov 7. Verify that the result of step 6 matches the saved copy from step 640*ae771770SStanislav Sedov 3, using a case-insensitive ASCII comparison. 641*ae771770SStanislav Sedov 642*ae771770SStanislav Sedov 8. Return the saved copy from step 5. 643*ae771770SStanislav Sedov 644*ae771770SStanislav Sedov5. ACE prefix 645*ae771770SStanislav Sedov 646*ae771770SStanislav Sedov The ACE prefix, used in the conversion operations (section 4), is two 647*ae771770SStanislav Sedov alphanumeric ASCII characters followed by two hyphen-minuses. It 648*ae771770SStanislav Sedov cannot be any of the prefixes already used in earlier documents, 649*ae771770SStanislav Sedov which includes the following: "bl--", "bq--", "dq--", "lq--", "mq--", 650*ae771770SStanislav Sedov "ra--", "wq--" and "zq--". The ToASCII and ToUnicode operations MUST 651*ae771770SStanislav Sedov recognize the ACE prefix in a case-insensitive manner. 652*ae771770SStanislav Sedov 653*ae771770SStanislav Sedov The ACE prefix for IDNA is "xn--" or any capitalization thereof. 654*ae771770SStanislav Sedov 655*ae771770SStanislav Sedov This means that an ACE label might be "xn--de-jg4avhby1noc0d", where 656*ae771770SStanislav Sedov "de-jg4avhby1noc0d" is the part of the ACE label that is generated by 657*ae771770SStanislav Sedov the encoding steps in [PUNYCODE]. 658*ae771770SStanislav Sedov 659*ae771770SStanislav Sedov While all ACE labels begin with the ACE prefix, not all labels 660*ae771770SStanislav Sedov beginning with the ACE prefix are necessarily ACE labels. Non-ACE 661*ae771770SStanislav Sedov labels that begin with the ACE prefix will confuse users and SHOULD 662*ae771770SStanislav Sedov NOT be allowed in DNS zones. 663*ae771770SStanislav Sedov 664*ae771770SStanislav Sedov 665*ae771770SStanislav Sedov 666*ae771770SStanislav Sedov 667*ae771770SStanislav Sedov 668*ae771770SStanislav Sedov 669*ae771770SStanislav Sedov 670*ae771770SStanislav Sedov 671*ae771770SStanislav Sedov 672*ae771770SStanislav Sedov 673*ae771770SStanislav Sedov 674*ae771770SStanislav SedovFaltstrom, et al. Standards Track [Page 12] 675*ae771770SStanislav Sedov 676*ae771770SStanislav SedovRFC 3490 IDNA March 2003 677*ae771770SStanislav Sedov 678*ae771770SStanislav Sedov 679*ae771770SStanislav Sedov6. Implications for typical applications using DNS 680*ae771770SStanislav Sedov 681*ae771770SStanislav Sedov In IDNA, applications perform the processing needed to input 682*ae771770SStanislav Sedov internationalized domain names from users, display internationalized 683*ae771770SStanislav Sedov domain names to users, and process the inputs and outputs from DNS 684*ae771770SStanislav Sedov and other protocols that carry domain names. 685*ae771770SStanislav Sedov 686*ae771770SStanislav Sedov The components and interfaces between them can be represented 687*ae771770SStanislav Sedov pictorially as: 688*ae771770SStanislav Sedov 689*ae771770SStanislav Sedov +------+ 690*ae771770SStanislav Sedov | User | 691*ae771770SStanislav Sedov +------+ 692*ae771770SStanislav Sedov ^ 693*ae771770SStanislav Sedov | Input and display: local interface methods 694*ae771770SStanislav Sedov | (pen, keyboard, glowing phosphorus, ...) 695*ae771770SStanislav Sedov +-------------------|-------------------------------+ 696*ae771770SStanislav Sedov | v | 697*ae771770SStanislav Sedov | +-----------------------------+ | 698*ae771770SStanislav Sedov | | Application | | 699*ae771770SStanislav Sedov | | (ToASCII and ToUnicode | | 700*ae771770SStanislav Sedov | | operations may be | | 701*ae771770SStanislav Sedov | | called here) | | 702*ae771770SStanislav Sedov | +-----------------------------+ | 703*ae771770SStanislav Sedov | ^ ^ | End system 704*ae771770SStanislav Sedov | | | | 705*ae771770SStanislav Sedov | Call to resolver: | | Application-specific | 706*ae771770SStanislav Sedov | ACE | | protocol: | 707*ae771770SStanislav Sedov | v | ACE unless the | 708*ae771770SStanislav Sedov | +----------+ | protocol is updated | 709*ae771770SStanislav Sedov | | Resolver | | to handle other | 710*ae771770SStanislav Sedov | +----------+ | encodings | 711*ae771770SStanislav Sedov | ^ | | 712*ae771770SStanislav Sedov +-----------------|----------|----------------------+ 713*ae771770SStanislav Sedov DNS protocol: | | 714*ae771770SStanislav Sedov ACE | | 715*ae771770SStanislav Sedov v v 716*ae771770SStanislav Sedov +-------------+ +---------------------+ 717*ae771770SStanislav Sedov | DNS servers | | Application servers | 718*ae771770SStanislav Sedov +-------------+ +---------------------+ 719*ae771770SStanislav Sedov 720*ae771770SStanislav Sedov The box labeled "Application" is where the application splits a 721*ae771770SStanislav Sedov domain name into labels, sets the appropriate flags, and performs the 722*ae771770SStanislav Sedov ToASCII and ToUnicode operations. This is described in section 4. 723*ae771770SStanislav Sedov 724*ae771770SStanislav Sedov 725*ae771770SStanislav Sedov 726*ae771770SStanislav Sedov 727*ae771770SStanislav Sedov 728*ae771770SStanislav Sedov 729*ae771770SStanislav Sedov 730*ae771770SStanislav SedovFaltstrom, et al. Standards Track [Page 13] 731*ae771770SStanislav Sedov 732*ae771770SStanislav SedovRFC 3490 IDNA March 2003 733*ae771770SStanislav Sedov 734*ae771770SStanislav Sedov 735*ae771770SStanislav Sedov6.1 Entry and display in applications 736*ae771770SStanislav Sedov 737*ae771770SStanislav Sedov Applications can accept domain names using any character set or sets 738*ae771770SStanislav Sedov desired by the application developer, and can display domain names in 739*ae771770SStanislav Sedov any charset. That is, the IDNA protocol does not affect the 740*ae771770SStanislav Sedov interface between users and applications. 741*ae771770SStanislav Sedov 742*ae771770SStanislav Sedov An IDNA-aware application can accept and display internationalized 743*ae771770SStanislav Sedov domain names in two formats: the internationalized character set(s) 744*ae771770SStanislav Sedov supported by the application, and as an ACE label. ACE labels that 745*ae771770SStanislav Sedov are displayed or input MUST always include the ACE prefix. 746*ae771770SStanislav Sedov Applications MAY allow input and display of ACE labels, but are not 747*ae771770SStanislav Sedov encouraged to do so except as an interface for special purposes, 748*ae771770SStanislav Sedov possibly for debugging, or to cope with display limitations as 749*ae771770SStanislav Sedov described in section 6.4.. ACE encoding is opaque and ugly, and 750*ae771770SStanislav Sedov should thus only be exposed to users who absolutely need it. Because 751*ae771770SStanislav Sedov name labels encoded as ACE name labels can be rendered either as the 752*ae771770SStanislav Sedov encoded ASCII characters or the proper decoded characters, the 753*ae771770SStanislav Sedov application MAY have an option for the user to select the preferred 754*ae771770SStanislav Sedov method of display; if it does, rendering the ACE SHOULD NOT be the 755*ae771770SStanislav Sedov default. 756*ae771770SStanislav Sedov 757*ae771770SStanislav Sedov Domain names are often stored and transported in many places. For 758*ae771770SStanislav Sedov example, they are part of documents such as mail messages and web 759*ae771770SStanislav Sedov pages. They are transported in many parts of many protocols, such as 760*ae771770SStanislav Sedov both the control commands and the RFC 2822 body parts of SMTP, and 761*ae771770SStanislav Sedov the headers and the body content in HTTP. It is important to 762*ae771770SStanislav Sedov remember that domain names appear both in domain name slots and in 763*ae771770SStanislav Sedov the content that is passed over protocols. 764*ae771770SStanislav Sedov 765*ae771770SStanislav Sedov In protocols and document formats that define how to handle 766*ae771770SStanislav Sedov specification or negotiation of charsets, labels can be encoded in 767*ae771770SStanislav Sedov any charset allowed by the protocol or document format. If a 768*ae771770SStanislav Sedov protocol or document format only allows one charset, the labels MUST 769*ae771770SStanislav Sedov be given in that charset. 770*ae771770SStanislav Sedov 771*ae771770SStanislav Sedov In any place where a protocol or document format allows transmission 772*ae771770SStanislav Sedov of the characters in internationalized labels, internationalized 773*ae771770SStanislav Sedov labels SHOULD be transmitted using whatever character encoding and 774*ae771770SStanislav Sedov escape mechanism that the protocol or document format uses at that 775*ae771770SStanislav Sedov place. 776*ae771770SStanislav Sedov 777*ae771770SStanislav Sedov All protocols that use domain name slots already have the capacity 778*ae771770SStanislav Sedov for handling domain names in the ASCII charset. Thus, ACE labels 779*ae771770SStanislav Sedov (internationalized labels that have been processed with the ToASCII 780*ae771770SStanislav Sedov operation) can inherently be handled by those protocols. 781*ae771770SStanislav Sedov 782*ae771770SStanislav Sedov 783*ae771770SStanislav Sedov 784*ae771770SStanislav Sedov 785*ae771770SStanislav Sedov 786*ae771770SStanislav SedovFaltstrom, et al. Standards Track [Page 14] 787*ae771770SStanislav Sedov 788*ae771770SStanislav SedovRFC 3490 IDNA March 2003 789*ae771770SStanislav Sedov 790*ae771770SStanislav Sedov 791*ae771770SStanislav Sedov6.2 Applications and resolver libraries 792*ae771770SStanislav Sedov 793*ae771770SStanislav Sedov Applications normally use functions in the operating system when they 794*ae771770SStanislav Sedov resolve DNS queries. Those functions in the operating system are 795*ae771770SStanislav Sedov often called "the resolver library", and the applications communicate 796*ae771770SStanislav Sedov with the resolver libraries through a programming interface (API). 797*ae771770SStanislav Sedov 798*ae771770SStanislav Sedov Because these resolver libraries today expect only domain names in 799*ae771770SStanislav Sedov ASCII, applications MUST prepare labels that are passed to the 800*ae771770SStanislav Sedov resolver library using the ToASCII operation. Labels received from 801*ae771770SStanislav Sedov the resolver library contain only ASCII characters; internationalized 802*ae771770SStanislav Sedov labels that cannot be represented directly in ASCII use the ACE form. 803*ae771770SStanislav Sedov ACE labels always include the ACE prefix. 804*ae771770SStanislav Sedov 805*ae771770SStanislav Sedov An operating system might have a set of libraries for performing the 806*ae771770SStanislav Sedov ToASCII operation. The input to such a library might be in one or 807*ae771770SStanislav Sedov more charsets that are used in applications (UTF-8 and UTF-16 are 808*ae771770SStanislav Sedov likely candidates for almost any operating system, and script- 809*ae771770SStanislav Sedov specific charsets are likely for localized operating systems). 810*ae771770SStanislav Sedov 811*ae771770SStanislav Sedov IDNA-aware applications MUST be able to work with both non- 812*ae771770SStanislav Sedov internationalized labels (those that conform to [STD13] and [STD3]) 813*ae771770SStanislav Sedov and internationalized labels. 814*ae771770SStanislav Sedov 815*ae771770SStanislav Sedov It is expected that new versions of the resolver libraries in the 816*ae771770SStanislav Sedov future will be able to accept domain names in other charsets than 817*ae771770SStanislav Sedov ASCII, and application developers might one day pass not only domain 818*ae771770SStanislav Sedov names in Unicode, but also in local script to a new API for the 819*ae771770SStanislav Sedov resolver libraries in the operating system. Thus the ToASCII and 820*ae771770SStanislav Sedov ToUnicode operations might be performed inside these new versions of 821*ae771770SStanislav Sedov the resolver libraries. 822*ae771770SStanislav Sedov 823*ae771770SStanislav Sedov Domain names passed to resolvers or put into the question section of 824*ae771770SStanislav Sedov DNS requests follow the rules for "queries" from [STRINGPREP]. 825*ae771770SStanislav Sedov 826*ae771770SStanislav Sedov6.3 DNS servers 827*ae771770SStanislav Sedov 828*ae771770SStanislav Sedov Domain names stored in zones follow the rules for "stored strings" 829*ae771770SStanislav Sedov from [STRINGPREP]. 830*ae771770SStanislav Sedov 831*ae771770SStanislav Sedov For internationalized labels that cannot be represented directly in 832*ae771770SStanislav Sedov ASCII, DNS servers MUST use the ACE form produced by the ToASCII 833*ae771770SStanislav Sedov operation. All IDNs served by DNS servers MUST contain only ASCII 834*ae771770SStanislav Sedov characters. 835*ae771770SStanislav Sedov 836*ae771770SStanislav Sedov If a signaling system which makes negotiation possible between old 837*ae771770SStanislav Sedov and new DNS clients and servers is standardized in the future, the 838*ae771770SStanislav Sedov encoding of the query in the DNS protocol itself can be changed from 839*ae771770SStanislav Sedov 840*ae771770SStanislav Sedov 841*ae771770SStanislav Sedov 842*ae771770SStanislav SedovFaltstrom, et al. Standards Track [Page 15] 843*ae771770SStanislav Sedov 844*ae771770SStanislav SedovRFC 3490 IDNA March 2003 845*ae771770SStanislav Sedov 846*ae771770SStanislav Sedov 847*ae771770SStanislav Sedov ACE to something else, such as UTF-8. The question whether or not 848*ae771770SStanislav Sedov this should be used is, however, a separate problem and is not 849*ae771770SStanislav Sedov discussed in this memo. 850*ae771770SStanislav Sedov 851*ae771770SStanislav Sedov6.4 Avoiding exposing users to the raw ACE encoding 852*ae771770SStanislav Sedov 853*ae771770SStanislav Sedov Any application that might show the user a domain name obtained from 854*ae771770SStanislav Sedov a domain name slot, such as from gethostbyaddr or part of a mail 855*ae771770SStanislav Sedov header, will need to be updated if it is to prevent users from seeing 856*ae771770SStanislav Sedov the ACE. 857*ae771770SStanislav Sedov 858*ae771770SStanislav Sedov If an application decodes an ACE name using ToUnicode but cannot show 859*ae771770SStanislav Sedov all of the characters in the decoded name, such as if the name 860*ae771770SStanislav Sedov contains characters that the output system cannot display, the 861*ae771770SStanislav Sedov application SHOULD show the name in ACE format (which always includes 862*ae771770SStanislav Sedov the ACE prefix) instead of displaying the name with the replacement 863*ae771770SStanislav Sedov character (U+FFFD). This is to make it easier for the user to 864*ae771770SStanislav Sedov transfer the name correctly to other programs. Programs that by 865*ae771770SStanislav Sedov default show the ACE form when they cannot show all the characters in 866*ae771770SStanislav Sedov a name label SHOULD also have a mechanism to show the name that is 867*ae771770SStanislav Sedov produced by the ToUnicode operation with as many characters as 868*ae771770SStanislav Sedov possible and replacement characters in the positions where characters 869*ae771770SStanislav Sedov cannot be displayed. 870*ae771770SStanislav Sedov 871*ae771770SStanislav Sedov The ToUnicode operation does not alter labels that are not valid ACE 872*ae771770SStanislav Sedov labels, even if they begin with the ACE prefix. After ToUnicode has 873*ae771770SStanislav Sedov been applied, if a label still begins with the ACE prefix, then it is 874*ae771770SStanislav Sedov not a valid ACE label, and is not equivalent to any of the 875*ae771770SStanislav Sedov intermediate Unicode strings constructed by ToUnicode. 876*ae771770SStanislav Sedov 877*ae771770SStanislav Sedov6.5 DNSSEC authentication of IDN domain names 878*ae771770SStanislav Sedov 879*ae771770SStanislav Sedov DNS Security [RFC2535] is a method for supplying cryptographic 880*ae771770SStanislav Sedov verification information along with DNS messages. Public Key 881*ae771770SStanislav Sedov Cryptography is used in conjunction with digital signatures to 882*ae771770SStanislav Sedov provide a means for a requester of domain information to authenticate 883*ae771770SStanislav Sedov the source of the data. This ensures that it can be traced back to a 884*ae771770SStanislav Sedov trusted source, either directly, or via a chain of trust linking the 885*ae771770SStanislav Sedov source of the information to the top of the DNS hierarchy. 886*ae771770SStanislav Sedov 887*ae771770SStanislav Sedov IDNA specifies that all internationalized domain names served by DNS 888*ae771770SStanislav Sedov servers that cannot be represented directly in ASCII must use the ACE 889*ae771770SStanislav Sedov form produced by the ToASCII operation. This operation must be 890*ae771770SStanislav Sedov performed prior to a zone being signed by the private key for that 891*ae771770SStanislav Sedov zone. Because of this ordering, it is important to recognize that 892*ae771770SStanislav Sedov DNSSEC authenticates the ASCII domain name, not the Unicode form or 893*ae771770SStanislav Sedov 894*ae771770SStanislav Sedov 895*ae771770SStanislav Sedov 896*ae771770SStanislav Sedov 897*ae771770SStanislav Sedov 898*ae771770SStanislav SedovFaltstrom, et al. Standards Track [Page 16] 899*ae771770SStanislav Sedov 900*ae771770SStanislav SedovRFC 3490 IDNA March 2003 901*ae771770SStanislav Sedov 902*ae771770SStanislav Sedov 903*ae771770SStanislav Sedov the mapping between the Unicode form and the ASCII form. In the 904*ae771770SStanislav Sedov presence of DNSSEC, this is the name that MUST be signed in the zone 905*ae771770SStanislav Sedov and MUST be validated against. 906*ae771770SStanislav Sedov 907*ae771770SStanislav Sedov One consequence of this for sites deploying IDNA in the presence of 908*ae771770SStanislav Sedov DNSSEC is that any special purpose proxies or forwarders used to 909*ae771770SStanislav Sedov transform user input into IDNs must be earlier in the resolution flow 910*ae771770SStanislav Sedov than DNSSEC authenticating nameservers for DNSSEC to work. 911*ae771770SStanislav Sedov 912*ae771770SStanislav Sedov7. Name server considerations 913*ae771770SStanislav Sedov 914*ae771770SStanislav Sedov Existing DNS servers do not know the IDNA rules for handling non- 915*ae771770SStanislav Sedov ASCII forms of IDNs, and therefore need to be shielded from them. 916*ae771770SStanislav Sedov All existing channels through which names can enter a DNS server 917*ae771770SStanislav Sedov database (for example, master files [STD13] and DNS update messages 918*ae771770SStanislav Sedov [RFC2136]) are IDN-unaware because they predate IDNA, and therefore 919*ae771770SStanislav Sedov requirement 2 of section 3.1 of this document provides the needed 920*ae771770SStanislav Sedov shielding, by ensuring that internationalized domain names entering 921*ae771770SStanislav Sedov DNS server databases through such channels have already been 922*ae771770SStanislav Sedov converted to their equivalent ASCII forms. 923*ae771770SStanislav Sedov 924*ae771770SStanislav Sedov It is imperative that there be only one ASCII encoding for a 925*ae771770SStanislav Sedov particular domain name. Because of the design of the ToASCII and 926*ae771770SStanislav Sedov ToUnicode operations, there are no ACE labels that decode to ASCII 927*ae771770SStanislav Sedov labels, and therefore name servers cannot contain multiple ASCII 928*ae771770SStanislav Sedov encodings of the same domain name. 929*ae771770SStanislav Sedov 930*ae771770SStanislav Sedov [RFC2181] explicitly allows domain labels to contain octets beyond 931*ae771770SStanislav Sedov the ASCII range (0..7F), and this document does not change that. 932*ae771770SStanislav Sedov Note, however, that there is no defined interpretation of octets 933*ae771770SStanislav Sedov 80..FF as characters. If labels containing these octets are returned 934*ae771770SStanislav Sedov to applications, unpredictable behavior could result. The ASCII form 935*ae771770SStanislav Sedov defined by ToASCII is the only standard representation for 936*ae771770SStanislav Sedov internationalized labels in the current DNS protocol. 937*ae771770SStanislav Sedov 938*ae771770SStanislav Sedov8. Root server considerations 939*ae771770SStanislav Sedov 940*ae771770SStanislav Sedov IDNs are likely to be somewhat longer than current domain names, so 941*ae771770SStanislav Sedov the bandwidth needed by the root servers is likely to go up by a 942*ae771770SStanislav Sedov small amount. Also, queries and responses for IDNs will probably be 943*ae771770SStanislav Sedov somewhat longer than typical queries today, so more queries and 944*ae771770SStanislav Sedov responses may be forced to go to TCP instead of UDP. 945*ae771770SStanislav Sedov 946*ae771770SStanislav Sedov 947*ae771770SStanislav Sedov 948*ae771770SStanislav Sedov 949*ae771770SStanislav Sedov 950*ae771770SStanislav Sedov 951*ae771770SStanislav Sedov 952*ae771770SStanislav Sedov 953*ae771770SStanislav Sedov 954*ae771770SStanislav SedovFaltstrom, et al. Standards Track [Page 17] 955*ae771770SStanislav Sedov 956*ae771770SStanislav SedovRFC 3490 IDNA March 2003 957*ae771770SStanislav Sedov 958*ae771770SStanislav Sedov 959*ae771770SStanislav Sedov9. References 960*ae771770SStanislav Sedov 961*ae771770SStanislav Sedov9.1 Normative References 962*ae771770SStanislav Sedov 963*ae771770SStanislav Sedov [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 964*ae771770SStanislav Sedov Requirement Levels", BCP 14, RFC 2119, March 1997. 965*ae771770SStanislav Sedov 966*ae771770SStanislav Sedov [STRINGPREP] Hoffman, P. and M. Blanchet, "Preparation of 967*ae771770SStanislav Sedov Internationalized Strings ("stringprep")", RFC 3454, 968*ae771770SStanislav Sedov December 2002. 969*ae771770SStanislav Sedov 970*ae771770SStanislav Sedov [NAMEPREP] Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep 971*ae771770SStanislav Sedov Profile for Internationalized Domain Names (IDN)", RFC 972*ae771770SStanislav Sedov 3491, March 2003. 973*ae771770SStanislav Sedov 974*ae771770SStanislav Sedov [PUNYCODE] Costello, A., "Punycode: A Bootstring encoding of 975*ae771770SStanislav Sedov Unicode for use with Internationalized Domain Names in 976*ae771770SStanislav Sedov Applications (IDNA)", RFC 3492, March 2003. 977*ae771770SStanislav Sedov 978*ae771770SStanislav Sedov [STD3] Braden, R., "Requirements for Internet Hosts -- 979*ae771770SStanislav Sedov Communication Layers", STD 3, RFC 1122, and 980*ae771770SStanislav Sedov "Requirements for Internet Hosts -- Application and 981*ae771770SStanislav Sedov Support", STD 3, RFC 1123, October 1989. 982*ae771770SStanislav Sedov 983*ae771770SStanislav Sedov [STD13] Mockapetris, P., "Domain names - concepts and 984*ae771770SStanislav Sedov facilities", STD 13, RFC 1034 and "Domain names - 985*ae771770SStanislav Sedov implementation and specification", STD 13, RFC 1035, 986*ae771770SStanislav Sedov November 1987. 987*ae771770SStanislav Sedov 988*ae771770SStanislav Sedov9.2 Informative References 989*ae771770SStanislav Sedov 990*ae771770SStanislav Sedov [RFC2535] Eastlake, D., "Domain Name System Security Extensions", 991*ae771770SStanislav Sedov RFC 2535, March 1999. 992*ae771770SStanislav Sedov 993*ae771770SStanislav Sedov [RFC2181] Elz, R. and R. Bush, "Clarifications to the DNS 994*ae771770SStanislav Sedov Specification", RFC 2181, July 1997. 995*ae771770SStanislav Sedov 996*ae771770SStanislav Sedov [UAX9] Unicode Standard Annex #9, The Bidirectional Algorithm, 997*ae771770SStanislav Sedov <http://www.unicode.org/unicode/reports/tr9/>. 998*ae771770SStanislav Sedov 999*ae771770SStanislav Sedov [UNICODE] The Unicode Consortium. The Unicode Standard, Version 1000*ae771770SStanislav Sedov 3.2.0 is defined by The Unicode Standard, Version 3.0 1001*ae771770SStanislav Sedov (Reading, MA, Addison-Wesley, 2000. ISBN 0-201-61633-5), 1002*ae771770SStanislav Sedov as amended by the Unicode Standard Annex #27: Unicode 1003*ae771770SStanislav Sedov 3.1 (http://www.unicode.org/reports/tr27/) and by the 1004*ae771770SStanislav Sedov Unicode Standard Annex #28: Unicode 3.2 1005*ae771770SStanislav Sedov (http://www.unicode.org/reports/tr28/). 1006*ae771770SStanislav Sedov 1007*ae771770SStanislav Sedov 1008*ae771770SStanislav Sedov 1009*ae771770SStanislav Sedov 1010*ae771770SStanislav SedovFaltstrom, et al. Standards Track [Page 18] 1011*ae771770SStanislav Sedov 1012*ae771770SStanislav SedovRFC 3490 IDNA March 2003 1013*ae771770SStanislav Sedov 1014*ae771770SStanislav Sedov 1015*ae771770SStanislav Sedov [USASCII] Cerf, V., "ASCII format for Network Interchange", RFC 1016*ae771770SStanislav Sedov 20, October 1969. 1017*ae771770SStanislav Sedov 1018*ae771770SStanislav Sedov10. Security Considerations 1019*ae771770SStanislav Sedov 1020*ae771770SStanislav Sedov Security on the Internet partly relies on the DNS. Thus, any change 1021*ae771770SStanislav Sedov to the characteristics of the DNS can change the security of much of 1022*ae771770SStanislav Sedov the Internet. 1023*ae771770SStanislav Sedov 1024*ae771770SStanislav Sedov This memo describes an algorithm which encodes characters that are 1025*ae771770SStanislav Sedov not valid according to STD3 and STD13 into octet values that are 1026*ae771770SStanislav Sedov valid. No security issues such as string length increases or new 1027*ae771770SStanislav Sedov allowed values are introduced by the encoding process or the use of 1028*ae771770SStanislav Sedov these encoded values, apart from those introduced by the ACE encoding 1029*ae771770SStanislav Sedov itself. 1030*ae771770SStanislav Sedov 1031*ae771770SStanislav Sedov Domain names are used by users to identify and connect to Internet 1032*ae771770SStanislav Sedov servers. The security of the Internet is compromised if a user 1033*ae771770SStanislav Sedov entering a single internationalized name is connected to different 1034*ae771770SStanislav Sedov servers based on different interpretations of the internationalized 1035*ae771770SStanislav Sedov domain name. 1036*ae771770SStanislav Sedov 1037*ae771770SStanislav Sedov When systems use local character sets other than ASCII and Unicode, 1038*ae771770SStanislav Sedov this specification leaves the the problem of transcoding between the 1039*ae771770SStanislav Sedov local character set and Unicode up to the application. If different 1040*ae771770SStanislav Sedov applications (or different versions of one application) implement 1041*ae771770SStanislav Sedov different transcoding rules, they could interpret the same name 1042*ae771770SStanislav Sedov differently and contact different servers. This problem is not 1043*ae771770SStanislav Sedov solved by security protocols like TLS that do not take local 1044*ae771770SStanislav Sedov character sets into account. 1045*ae771770SStanislav Sedov 1046*ae771770SStanislav Sedov Because this document normatively refers to [NAMEPREP], [PUNYCODE], 1047*ae771770SStanislav Sedov and [STRINGPREP], it includes the security considerations from those 1048*ae771770SStanislav Sedov documents as well. 1049*ae771770SStanislav Sedov 1050*ae771770SStanislav Sedov If or when this specification is updated to use a more recent Unicode 1051*ae771770SStanislav Sedov normalization table, the new normalization table will need to be 1052*ae771770SStanislav Sedov compared with the old to spot backwards incompatible changes. If 1053*ae771770SStanislav Sedov there are such changes, they will need to be handled somehow, or 1054*ae771770SStanislav Sedov there will be security as well as operational implications. Methods 1055*ae771770SStanislav Sedov to handle the conflicts could include keeping the old normalization, 1056*ae771770SStanislav Sedov or taking care of the conflicting characters by operational means, or 1057*ae771770SStanislav Sedov some other method. 1058*ae771770SStanislav Sedov 1059*ae771770SStanislav Sedov Implementations MUST NOT use more recent normalization tables than 1060*ae771770SStanislav Sedov the one referenced from this document, even though more recent tables 1061*ae771770SStanislav Sedov may be provided by operating systems. If an application is unsure of 1062*ae771770SStanislav Sedov which version of the normalization tables are in the operating 1063*ae771770SStanislav Sedov 1064*ae771770SStanislav Sedov 1065*ae771770SStanislav Sedov 1066*ae771770SStanislav SedovFaltstrom, et al. Standards Track [Page 19] 1067*ae771770SStanislav Sedov 1068*ae771770SStanislav SedovRFC 3490 IDNA March 2003 1069*ae771770SStanislav Sedov 1070*ae771770SStanislav Sedov 1071*ae771770SStanislav Sedov system, the application needs to include the normalization tables 1072*ae771770SStanislav Sedov itself. Using normalization tables other than the one referenced 1073*ae771770SStanislav Sedov from this specification could have security and operational 1074*ae771770SStanislav Sedov implications. 1075*ae771770SStanislav Sedov 1076*ae771770SStanislav Sedov To help prevent confusion between characters that are visually 1077*ae771770SStanislav Sedov similar, it is suggested that implementations provide visual 1078*ae771770SStanislav Sedov indications where a domain name contains multiple scripts. Such 1079*ae771770SStanislav Sedov mechanisms can also be used to show when a name contains a mixture of 1080*ae771770SStanislav Sedov simplified and traditional Chinese characters, or to distinguish zero 1081*ae771770SStanislav Sedov and one from O and l. DNS zone adminstrators may impose restrictions 1082*ae771770SStanislav Sedov (subject to the limitations in section 2) that try to minimize 1083*ae771770SStanislav Sedov homographs. 1084*ae771770SStanislav Sedov 1085*ae771770SStanislav Sedov Domain names (or portions of them) are sometimes compared against a 1086*ae771770SStanislav Sedov set of privileged or anti-privileged domains. In such situations it 1087*ae771770SStanislav Sedov is especially important that the comparisons be done properly, as 1088*ae771770SStanislav Sedov specified in section 3.1 requirement 4. For labels already in ASCII 1089*ae771770SStanislav Sedov form, the proper comparison reduces to the same case-insensitive 1090*ae771770SStanislav Sedov ASCII comparison that has always been used for ASCII labels. 1091*ae771770SStanislav Sedov 1092*ae771770SStanislav Sedov The introduction of IDNA means that any existing labels that start 1093*ae771770SStanislav Sedov with the ACE prefix and would be altered by ToUnicode will 1094*ae771770SStanislav Sedov automatically be ACE labels, and will be considered equivalent to 1095*ae771770SStanislav Sedov non-ASCII labels, whether or not that was the intent of the zone 1096*ae771770SStanislav Sedov adminstrator or registrant. 1097*ae771770SStanislav Sedov 1098*ae771770SStanislav Sedov11. IANA Considerations 1099*ae771770SStanislav Sedov 1100*ae771770SStanislav Sedov IANA has assigned the ACE prefix in consultation with the IESG. 1101*ae771770SStanislav Sedov 1102*ae771770SStanislav Sedov 1103*ae771770SStanislav Sedov 1104*ae771770SStanislav Sedov 1105*ae771770SStanislav Sedov 1106*ae771770SStanislav Sedov 1107*ae771770SStanislav Sedov 1108*ae771770SStanislav Sedov 1109*ae771770SStanislav Sedov 1110*ae771770SStanislav Sedov 1111*ae771770SStanislav Sedov 1112*ae771770SStanislav Sedov 1113*ae771770SStanislav Sedov 1114*ae771770SStanislav Sedov 1115*ae771770SStanislav Sedov 1116*ae771770SStanislav Sedov 1117*ae771770SStanislav Sedov 1118*ae771770SStanislav Sedov 1119*ae771770SStanislav Sedov 1120*ae771770SStanislav Sedov 1121*ae771770SStanislav Sedov 1122*ae771770SStanislav SedovFaltstrom, et al. Standards Track [Page 20] 1123*ae771770SStanislav Sedov 1124*ae771770SStanislav SedovRFC 3490 IDNA March 2003 1125*ae771770SStanislav Sedov 1126*ae771770SStanislav Sedov 1127*ae771770SStanislav Sedov12. Authors' Addresses 1128*ae771770SStanislav Sedov 1129*ae771770SStanislav Sedov Patrik Faltstrom 1130*ae771770SStanislav Sedov Cisco Systems 1131*ae771770SStanislav Sedov Arstaangsvagen 31 J 1132*ae771770SStanislav Sedov S-117 43 Stockholm Sweden 1133*ae771770SStanislav Sedov 1134*ae771770SStanislav Sedov EMail: paf@cisco.com 1135*ae771770SStanislav Sedov 1136*ae771770SStanislav Sedov 1137*ae771770SStanislav Sedov Paul Hoffman 1138*ae771770SStanislav Sedov Internet Mail Consortium and VPN Consortium 1139*ae771770SStanislav Sedov 127 Segre Place 1140*ae771770SStanislav Sedov Santa Cruz, CA 95060 USA 1141*ae771770SStanislav Sedov 1142*ae771770SStanislav Sedov EMail: phoffman@imc.org 1143*ae771770SStanislav Sedov 1144*ae771770SStanislav Sedov 1145*ae771770SStanislav Sedov Adam M. Costello 1146*ae771770SStanislav Sedov University of California, Berkeley 1147*ae771770SStanislav Sedov 1148*ae771770SStanislav Sedov URL: http://www.nicemice.net/amc/ 1149*ae771770SStanislav Sedov 1150*ae771770SStanislav Sedov 1151*ae771770SStanislav Sedov 1152*ae771770SStanislav Sedov 1153*ae771770SStanislav Sedov 1154*ae771770SStanislav Sedov 1155*ae771770SStanislav Sedov 1156*ae771770SStanislav Sedov 1157*ae771770SStanislav Sedov 1158*ae771770SStanislav Sedov 1159*ae771770SStanislav Sedov 1160*ae771770SStanislav Sedov 1161*ae771770SStanislav Sedov 1162*ae771770SStanislav Sedov 1163*ae771770SStanislav Sedov 1164*ae771770SStanislav Sedov 1165*ae771770SStanislav Sedov 1166*ae771770SStanislav Sedov 1167*ae771770SStanislav Sedov 1168*ae771770SStanislav Sedov 1169*ae771770SStanislav Sedov 1170*ae771770SStanislav Sedov 1171*ae771770SStanislav Sedov 1172*ae771770SStanislav Sedov 1173*ae771770SStanislav Sedov 1174*ae771770SStanislav Sedov 1175*ae771770SStanislav Sedov 1176*ae771770SStanislav Sedov 1177*ae771770SStanislav Sedov 1178*ae771770SStanislav SedovFaltstrom, et al. Standards Track [Page 21] 1179*ae771770SStanislav Sedov 1180*ae771770SStanislav SedovRFC 3490 IDNA March 2003 1181*ae771770SStanislav Sedov 1182*ae771770SStanislav Sedov 1183*ae771770SStanislav Sedov13. Full Copyright Statement 1184*ae771770SStanislav Sedov 1185*ae771770SStanislav Sedov Copyright (C) The Internet Society (2003). All Rights Reserved. 1186*ae771770SStanislav Sedov 1187*ae771770SStanislav Sedov This document and translations of it may be copied and furnished to 1188*ae771770SStanislav Sedov others, and derivative works that comment on or otherwise explain it 1189*ae771770SStanislav Sedov or assist in its implementation may be prepared, copied, published 1190*ae771770SStanislav Sedov and distributed, in whole or in part, without restriction of any 1191*ae771770SStanislav Sedov kind, provided that the above copyright notice and this paragraph are 1192*ae771770SStanislav Sedov included on all such copies and derivative works. However, this 1193*ae771770SStanislav Sedov document itself may not be modified in any way, such as by removing 1194*ae771770SStanislav Sedov the copyright notice or references to the Internet Society or other 1195*ae771770SStanislav Sedov Internet organizations, except as needed for the purpose of 1196*ae771770SStanislav Sedov developing Internet standards in which case the procedures for 1197*ae771770SStanislav Sedov copyrights defined in the Internet Standards process must be 1198*ae771770SStanislav Sedov followed, or as required to translate it into languages other than 1199*ae771770SStanislav Sedov English. 1200*ae771770SStanislav Sedov 1201*ae771770SStanislav Sedov The limited permissions granted above are perpetual and will not be 1202*ae771770SStanislav Sedov revoked by the Internet Society or its successors or assigns. 1203*ae771770SStanislav Sedov 1204*ae771770SStanislav Sedov This document and the information contained herein is provided on an 1205*ae771770SStanislav Sedov "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING 1206*ae771770SStanislav Sedov TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING 1207*ae771770SStanislav Sedov BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION 1208*ae771770SStanislav Sedov HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF 1209*ae771770SStanislav Sedov MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 1210*ae771770SStanislav Sedov 1211*ae771770SStanislav SedovAcknowledgement 1212*ae771770SStanislav Sedov 1213*ae771770SStanislav Sedov Funding for the RFC Editor function is currently provided by the 1214*ae771770SStanislav Sedov Internet Society. 1215*ae771770SStanislav Sedov 1216*ae771770SStanislav Sedov 1217*ae771770SStanislav Sedov 1218*ae771770SStanislav Sedov 1219*ae771770SStanislav Sedov 1220*ae771770SStanislav Sedov 1221*ae771770SStanislav Sedov 1222*ae771770SStanislav Sedov 1223*ae771770SStanislav Sedov 1224*ae771770SStanislav Sedov 1225*ae771770SStanislav Sedov 1226*ae771770SStanislav Sedov 1227*ae771770SStanislav Sedov 1228*ae771770SStanislav Sedov 1229*ae771770SStanislav Sedov 1230*ae771770SStanislav Sedov 1231*ae771770SStanislav Sedov 1232*ae771770SStanislav Sedov 1233*ae771770SStanislav Sedov 1234*ae771770SStanislav SedovFaltstrom, et al. Standards Track [Page 22] 1235*ae771770SStanislav Sedov 1236