Post on 18-Dec-2015
International Domain Name
TWNICNai-Wen Hsu
snw@twnic.net.tw
Domain name RFC 1035
A label can not longer than 63 characters A domain name can not longer than 255
characters Maximum labels: 127 Only accept a-z,0-9,’-’ as domain name
Limited ASCII character code point, 37 LDH (Letter-Digit-Hyphen)
International Domain Name IETF IDN WG adopt UNICODE 3.2
Greek, Cyrillic, Armenian, Hebrew, Arabic,Syriac, Thaana, Devanagari, Bengali,Gurmukhi, Gujarati, Oriya, Tamil, Telugu,Kannada, Malayalam, Sinhala, Thai, …
95,156 characters
International Domain Name sample レコード会社 .jp gwmöbler.com 慎昌鐘錶 .tw 阿克苏诺贝尔油漆公司 .cn 소프트웨어 .kr םוק. לארשי
IETF IDN Standard IDNA (RFC3490)
Internationalizing Domain Names in Applications NAMEPREP(RFC3491)
A Stringprep Profile for Internationalized Domain Names
PUNYCODE(RFC3492) A Bootstring encoding of Unicode for Internation
alized Domain Names in Applications STRINGPREP(RFC3454)
Preparation of Internationalized Strings
User
IDNA-aware Application (ToASCII and ToUnicode
operations may be called here)
Resolver
DNS ServersApplication
Servers
DNS ProtocolACE
Call to resolverACE
Application-specificProtocol: ACEUnless the protocol Is updated to handleOther encodings
Input and display: local interface methods (pen, keyboard, ...)
End system
"Application" is where the application splits a hostname into labels, sets the appropriate flags, and performs the ToASCIIand ToUnicode operations.
IDNA components and interfaces
IDNAIDNA
xn--de-jg4avhby1noc0d
IDNA Structure
NAMEPREP• Mapping• Normalization• Prohibit
ACE(PUNYCODE)
User input
(UNICODE)
STRINGPREP
To resolverACE
Nameprep:A Stringprep Profile for Internationalized Domain Names IDNAIDNA
ToASCII ToUnicode
NAMEPREP A Stringprep Profile for Internationaliz
ed Domain Names Mapping
Stringprep table B.1,B.2 Normalization
Form KC Prohibited Output
Stringprep table C.1.2,2.2,3,4,5,6,7,8,9
NAMEPREP -- Mapping Commonly mapped to nothing: 27
Ex: Mapping for case-folding used with
NFKC: 1371 Ex:
A a (U+0041U+0061) (U+03ABU+03CB) (U+3371U+0068 U+0070 U+0061)
NAMEPREP -- Normalization Unicode normalization with form
KC
NAMEPREP -- Normalization ‘u’+‘‥’ ‘ü’ ‘ a’‘ a’
NAMEPREP – Prohibited output Non-ASCII space characters: 17
Ex: (NO-BREAK SPACE) Non-ASCII control characters: 54
Ex: (DEVICE CONTROL STRING) Private use: 133371 Non-character code points: 49 Surrogate codes: 2048
NAMEPREP – Prohibited output Inappropriate for plain text: 4 Inappropriate for canonical
representation: 12 Change display properties or
are deprecated: 13 Tagging characters: 97
PUNYCODE A Bootstring encoding of Unicode for I
DNA One of the ACE(ASCII Compatible Encoding)
Translate non-ASCII characters to ASCII characters
Prefix: xn-- Ex:
慎昌鐘錶 .tw xn--ciun9hb52c2za.tw
Insufficient in IDN standard Current IDN standard (IDNA, NAMEPR
EP, PUNYCODE) can not solve Chinese domain name requirement Tradition/Simplify Chinese mapping
Ex: 台 臺 Writing variant mapping
Ex: 峰 峯
Insufficient in IDN standard They are the same meaning but it is di
fferent character in different countries In China:
劝 (529D) In Japan:
勧 (52E7) In Taiwan:
勸 (52F8)
IDN administration guide line Registration policy to solve those pro
blems listed above Every language has a variant table wit
h 3 fields: valid code point recommended variant character variant
Variant Table sample
Valid code point(VCP)
Recommended variants by .tw
(twRV)
Recommended variants by .
cn(cnRV)
Character Variant(s)
(CV)Remarks
丁 (4E01) 丁 (4E01) 丁 (4E01) 丁 (4E01) Singular-relation character(1)
丄 (4E04) 上 (4E0A) 上 (4E0A)丄 (4E04) 上(4E0A) Pair-relation
characters(2.1)
上 (4E0A) 上 (4E0A) 上 (4E0A) 丄 (4E04) 上(4E0A)
万 (4E07) 万 (4E07) 万 (4E07) 万 (4E07) 萬(842C) Pair-relation
characters(2.2)
萬 (842C) 萬 (842C) 万 (4E07) 万 (4E07) 萬(842C)
Valid code point(VCP)
Recommended variants by .t
w(twRV)
Recommended variants by .c
n(cnRV)
Character Variant(s)
(CV)remarks
叶 (53F6) 葉 (8449) 叶 (53F6) 叶 (53F6)葉 (8449) Pair-relation
characters (2.3)葉 (8449) 葉 (8449) 叶 (53F6)
叶 (53F6)葉 (8449)
个 (4E2A) 個 (500B) 个 (4E2A)个 (4E2A)個 (500B)箇 (7B87)
Multiple-relationCharacters
個 (500B) 個 (500B) 个 (4E2A)个 (4E2A)個 (500B)箇 (7B87)
箇 (7B87) 個 (500B) 个 (4E2A)个 (4E2A)個 (500B)箇 (7B87)
Variant Table sample
Variant Table
Singular-relation character (VCP=twRV=cnRV=CV): 13888(66.4%)
VCP=twRV≠cnRV: 2783 (13.3%) VCP=cnRV≠twRV: 2453(11.7%) VCP≠(twRV=cnRV): 333(1.6%) VCP≠twRV≠SCR: 387(1.9%)
Variant Table
Number of character variant(s)
1 2 3 4 5 6 7 8
Number of Characters
1388866.4%
515624.7
%
11585.5%
4242.0%
1650.79%
600.29%
350.17%
160.08%
Variant Table
• The table draft is prepared by the CCMT Task force organized by TWNIC from January, 2002.
• Task force members have 9 experts from language linguist, computer experts and DNS experts.
• The table draft has submitted to the Bureau of Standards, Ministry of Economic Affairs to final review.
Registration procedure A Registrant should select the language(s) Activation of the requested domain
name(s) & Reservation of the equivalence(s) should be provided by the Registry, within the language-based character set
The registrant can require the activation of the reserved equivalent domain name(s) at any time
Registration sample A user select zh-tw and zh-cn languag
e with domain name 丁上萬 .com 丁上萬 .com (Recommended variants for
zh-tw) 丁上万 .com (Recommended variants for
zh-cn) 丁丄万 .com (Character Variant) 丁丄萬 .com (Character Variant)
Q & A