RE: IDNs, do they work? (some scripts are less equal than others).

Safari results now added to http://wwww.w3.org/International/tests/results/results-idn-display.php 

Many thanks, Najib.

RI
 
============
Richard Ishida
Internationalization Lead
W3C (World Wide Web Consortium)
 
http://www.w3.org/People/Ishida/
http://www.w3.org/International/
http://people.w3.org/rishida/blog/
http://www.flickr.com/photos/ishida/
 
 


________________________________

 From: Najib Tounsi [mailto:ntounsi@emi.ac.ma] 
 Sent: 24 March 2007 10:04
 To: Richard Ishida
 Cc: 'WWW International'; 'W3C Offices'
 Subject: Re: IDNs, do they work? (some scripts are less equal than others).
 
 

 I18N Tests: IDN display 1


 Previous test Test index Next test   Results

 These tests check whether a user agent displays IDNs (Internationalized Domain Names) as Unicode or punycode in the status bar. User agents that try to detect possible homograph attacks do so in different ways. These tests explore some of those approaches. They are not exhaustive, and the results may change over time, since there is no standard for how to proceed in this respect, and some of the tests are based on lists that may change.

 For more information about what to expect see the article An Introduction to Multilingual Web Addresses <http://www.w3.org/International/articles/idn-and-iri/#work> .

 Mouse over the links in the tests and note whether the domain name is displayed as punycode or Unicode characters in the status bar.

 Run each test twice. First with only en or en-US listed in the browser language preferences, and secondly with the following additional languages in the preferences: Russian, Japanese, German, Greek, Hindi, Armenian, Thai and 'am' (user defined code for Amharic).


 Result of Safari Tests:


 It seems that only Cyrillic and Greek IDNs are displayed in punycode in the status bar of Safari 2.0.1. All other IDNs are displayed in Unicode. And this in both tests. Run 1 (en is the language preference) and Run 2 (with the above additional languages, without amharic).

 In the following, UU means Unicode display in he first and the second run. PP means punycode  display in he first and the second run. There aren't UP or PU.


 1 Latin characters


 1. charþ.is <http://example.char%C3%BE.is>  (Latin1 character supported by .is TLD, but not .hu TLD)  UU 
 2. charő.hu <http://example.char%C5%91.hu>  (Extended Latin character supported by .hu TLD but not .is TLD) UU 
 3. charþ.hu <http://example.char%C3%BE.hu>  (Latin1 character supported by .is TLD, but not .hu TLD) UU 
 4. charő.is <http://example.char%C5%91.is>  (Extended Latin character supported by .hu TLD but not .is TLD) UU 
 5. charþ.com <http://example.char%C3%BE.com>  (Latin1 character supported by .is TLD, but not .hu TLD) UU 
 6. charő.com <http://example.char%C5%91.com>  (Extended Latin character supported by .hu TLD but not .is TLD) UU 
 7. charþ.xy <http://example.char%C3%BE.xy>  (Latin1 character supported by .is TLD, but not .hu TLD) UU 
 8. charő.xy <http://example.char%C5%91.xy>  (Extended Latin character supported by .hu TLD but not .is TLD) UU 
 9. charþ.fi <http://example.char%C3%BE.fi>  (Latin1 character not supported by .fi TLD) UU 
 10. charő.fi <http://example.char%C5%91.fi>  (Extended Latin character not supported by .fi TLD) UU 

 Run 1: with HTTP_ACCEPT_LANGUAGE = en. Safari 2.0 displays IDNs  as Unicode in the status bar for all the above 10 cases.

 Run 2 (HTTP_ACCEPT_LANGUAGE different from en-* ) : Same as Run 1. 


 2 Non-Latin characters


 1. кириллица.ru <http://example.%D0%BA%D0%B8%D1%80%D0%B8%D0%BB%D0%BB%D0%B8%D1%86%D0%B0.ru>  (Cyrillic characters) PP 
 2. ελληνικά.gr <http://example.%CE%B5%CE%BB%CE%BB%CE%B7%CE%BD%CE%B9%CE%BA%CE%AC.gr>  (Greek characters) PP 
 3. 漢字.jp <http://example.%E6%BC%A2%E5%AD%97.jp>  (Kanji characters) UU 
 4. かな.jp <http://example.%E3%81%8B%E3%81%AA.jp>  (Hiragana characters) UU 
 5. यूनिकोड.in <http://example.%E0%A4%AF%E0%A5%82%E0%A4%A8%E0%A4%BF%E0%A4%95%E0%A5%8B%E0%A4%A1.in>  (Devanagari characters) UU 
 6. кириллица.fi <http://example.%D0%BA%D0%B8%D1%80%D0%B8%D0%BB%D0%BB%D0%B8%D1%86%D0%B0.fi>  (Cyrillic characters are not allowed in .fi TLDs) PP  
 7. ελληνικά.fi <http://example.%CE%B5%CE%BB%CE%BB%CE%B7%CE%BD%CE%B9%CE%BA%CE%AC.fi>  (Greek characters are not allowed in .fi TLDs) PP 
 8. 漢字.fi <http://example.%E6%BC%A2%E5%AD%97.fi>  (Kanji characters are not allowed in .fi TLDs) UU 
 9. यूनिकोड.fi <http://example.%E0%A4%AF%E0%A5%82%E0%A4%A8%E0%A4%BF%E0%A4%95%E0%A5%8B%E0%A4%A1.fi>  (Devanagari characters are not allowed in .fi TLDs) UU 
 10. यूनिकोड.de <http://example.%E0%A4%AF%E0%A5%82%E0%A4%A8%E0%A4%BF%E0%A4%95%E0%A5%8B%E0%A4%A1.de>  (Devanagari characters are not allowed in .de TLDs) UU 
 11. Հայերեն.de <http://example.%D5%80%D5%A1%D5%B5%D5%A5%D6%80%D5%A5%D5%B6.de>  (Armenian characters) UU 
 12. Հայերեն.am <http://example.%D5%80%D5%A1%D5%B5%D5%A5%D6%80%D5%A5%D5%B6.am>  (Armenian characters) UU 
 13. ภาษาไทย.th <http://example.%E0%B8%A0%E0%B8%B2%E0%B8%A9%E0%B8%B2%E0%B9%84%E0%B8%97%E0%B8%A2.th>  (Thai characters) UU 
 14. ภาษาไทย.com <http://example.%E0%B8%A0%E0%B8%B2%E0%B8%A9%E0%B8%B2%E0%B9%84%E0%B8%97%E0%B8%A2.com>  (Thai characters) UU 
 15. ህሔራዊነት.de <http://example.%E1%88%85%E1%88%94%E1%88%AB%E1%8B%8A%E1%8A%90%E1%89%B5.de>  (Amharic, Ethiopic characters) PP 
 16. ህሔራዊነት.er <http://example.%E1%88%85%E1%88%94%E1%88%AB%E1%8B%8A%E1%8A%90%E1%89%B5.er>  (Amharic, Ethiopic characters) PP 

 Run 1: with HTTP_ACCEPT_LANGUAGE = en. Safari 2.0 displays IDNs  as punycode  in the status bar for cases 1, 2, 6, 7, 15, 16, and as Unicode for the others cases.
 Note :-) the height of the status bar in Safari is not enough to see Thai fonts in cases 13. and 14. 

 Run 2: Same as Run 1 


 3 Non-Latin characters mixed with Latin


 1. кириллицаascii.ru <http://example.%D0%BA%D0%B8%D1%80%D0%B8%D0%BB%D0%BB%D0%B8%D1%86%D0%B0ascii.ru>  (Cyrillic + ascii characters) PP 
 2. ελληνικάascii.gr <http://example.%CE%B5%CE%BB%CE%BB%CE%B7%CE%BD%CE%B9%CE%BA%CE%ACascii.gr>  (Greek + ascii characters) PP 
 3. 漢字ascii.jp <http://example.%E6%BC%A2%E5%AD%97ascii.jp>  (Kanji + ascii characters) UU 
 4. かなascii.jp <http://example.%E3%81%8B%E3%81%AAascii.jp>  (Hiragana + ascii characters) UU 
 5. यूनिकोडascii.in <http://example.%E0%A4%AF%E0%A5%82%E0%A4%A8%E0%A4%BF%E0%A4%95%E0%A5%8B%E0%A4%A1ascii.in>  (Devanagari + ascii characters) UU 
 6. кириллицаascii.de <http://example.%D0%BA%D0%B8%D1%80%D0%B8%D0%BB%D0%BB%D0%B8%D1%86%D0%B0ascii.de>  (Cyrillic + ascii characters) PP 
 7. ελληνικάascii.de <http://example.%CE%B5%CE%BB%CE%BB%CE%B7%CE%BD%CE%B9%CE%BA%CE%ACascii.de>  (Greek + ascii characters) PP 
 8. 漢字ascii.de <http://example.%E6%BC%A2%E5%AD%97ascii.de>  (Kanji + ascii characters) UU 
 9. かなascii.de <http://example.%E3%81%8B%E3%81%AAascii.de>  (Hiragana + ascii characters) UU 
 10. यूनिकोडascii.de <http://example.%E0%A4%AF%E0%A5%82%E0%A4%A8%E0%A4%BF%E0%A4%95%E0%A5%8B%E0%A4%A1ascii.de>  (Devanagari + ascii characters) UU 
 11. кириллицchará.ru <http://example.%D0%BA%D0%B8%D1%80%D0%B8%D0%BB%D0%BB%D0%B8%D1%86%D0%B0char%C3%A1.ru>  (Cyrillic + accented Latin characters) PP 
 12. ελληνικάchará.gr <http://example.%CE%B5%CE%BB%CE%BB%CE%B7%CE%BD%CE%B9%CE%BA%CE%ACchar%C3%A1.gr>  (Greek + accented Latin characters) PP 
 13. 漢字 chará.jp <http://example.%E6%BC%A2%E5%AD%97char%C3%A1.jp>  (Kanji + accented Latin characters) UU 
 14. かなchará.jp <http://example.%E3%81%8B%E3%81%AAchar%C3%A1.jp>  (Hiragana + accented Latin characters) UU 
 15. यूनिकोडchará.in <http://example.%E0%A4%AF%E0%A5%82%E0%A4%A8%E0%A4%BF%E0%A4%95%E0%A5%8B%E0%A4%A1char%C3%A1.in>  (Devanagari + accented Latin characters) UU 
 16. кириллицchará.de <http://example.%D0%BA%D0%B8%D1%80%D0%B8%D0%BB%D0%BB%D0%B8%D1%86char%C3%A1.de>  (Cyrillic + accented Latin characters) PP 
 17. ελληνικάchará.de <http://example.%CE%B5%CE%BB%CE%BB%CE%B7%CE%BD%CE%B9%CE%BA%CE%ACchar%C3%A1.de>  (Greek + accented Latin characters) PP 
 18. 漢字 chará.de <http://example.%E6%BC%A2%E5%AD%97char%C3%A1.de>  (Kanji + accented Latin characters) UU 
 19. かなchará.de <http://example.%E3%81%8B%E3%81%AAchar%C3%A1.de>  (Hiragana + accented Latin characters) UU 
 20. यूनिकोडchará.de <http://example.%E0%A4%AF%E0%A5%82%E0%A4%A8%E0%A4%BF%E0%A4%95%E0%A5%8B%E0%A4%A1char%C3%A1.de>  (Devanagari + accented Latin characters) UU 
 21. pаypal.com <http://p%D0%B0ypal.com>  (The first a is cyrillic) PP 

 Run 1: with HTTP_ACCEPT_LANGUAGE = en. Safari 2.0 displays IDNs  as punycode  in the status bar for cases 1, 2, 6, 7, 11, 12, 16, 17, 21 and as Unicode for the others cases.
 

 Run 2: Same as Run 1 


 4 Kanji and kana characters mixed


 1. 漢字かな.jp <http://example.%E6%BC%A2%E5%AD%97%E3%81%8B%E3%81%AA.jp>  UU 
 2. 漢字かな.de <http://example.%E6%BC%A2%E5%AD%97%E3%81%8B%E3%81%AA.de>  UU 
 3. 漢字かな.ru <http://example.%E6%BC%A2%E5%AD%97%E3%81%8B%E3%81%AA.ru>  UU 
 4. 漢字かな.in <http://example.%E6%BC%A2%E5%AD%97%E3%81%8B%E3%81%AA.in>  UU 
 5. 漢字かなascii.jp <http://example.%E6%BC%A2%E5%AD%97%E3%81%8B%E3%81%AAascii.jp>  UU 
 6. 漢字かなchará.jp <http://example.%E6%BC%A2%E5%AD%97%E3%81%8B%E3%81%AAchar%C3%A1.jp>  UU 

 Run 1: with HTTP_ACCEPT_LANGUAGE = en. Safari 2.0 displays IDNs  as Unicode in the status bar for all the above 6 cases.

 Run 2: Same as Run 1 


 5 Non-Latin mixtures


 1. кириллица 漢字.ru <http://example.%D0%BA%D0%B8%D1%80%D0%B8%D0%BB%D0%BB%D0%B8%D1%86%D0%B0%E6%BC%A2%E5%AD%97.ru>  (Cyrillic + kanji characters) PP 
 2. кириллица 漢字.jp <http://example.%D0%BA%D0%B8%D1%80%D0%B8%D0%BB%D0%BB%D0%B8%D1%86%D0%B0%E6%BC%A2%E5%AD%97.jp>  (Cyrillic + kanji characters) PP 
 3. यूनिकोड 漢字.in <http://example.%E0%A4%AF%E0%A5%82%E0%A4%A8%E0%A4%BF%E0%A4%95%E0%A5%8B%E0%A4%A1%E6%BC%A2%E5%AD%97.in>  (Devanagari + kanji characters) UU 
 4. यूनिकोड 漢字.jp <http://example.%E0%A4%AF%E0%A5%82%E0%A4%A8%E0%A4%BF%E0%A4%95%E0%A5%8B%E0%A4%A1%E6%BC%A2%E5%AD%97.jp>  (Devanagari + kanji characters) UU 
 5. & epsilon;λληνικά漢字.jp <http://example.%CE%B5%CE%BB%CE%BB%CE%B7%CE%BD%CE%B9%CE%BA%CE%AC%E6%BC%A2%E5%AD%97.jp>  (Greek + kanji characters) PP 
 6. & epsilon;λληνικά漢字.gr <http://example.%CE%B5%CE%BB%CE%BB%CE%B7%CE%BD%CE%B9%CE%BA%CE%AC%E6%BC%A2%E5%AD%97.gr>  (Greek + kanji characters) PP 

 Run 1: with HTTP_ACCEPT_LANGUAGE = en. Safari 2.0 displays IDNs  as Unicode  in the status bar for 3, 4 and as punycode for the others cases.
 

 Run 2: Same as Run 1 


 5 Unusual characters


 1. example.com⁄foo.museum <http://example.com%E2%81%84foo.museum>  (Fraction slash in domain name) UU 
 2. I♥NY.museum <http://example.I%E2%99%A5NY.museum>  (Non-alphabetic character) UU 

 Run 1: with HTTP_ACCEPT_LANGUAGE = en. Safari 2.0 displays IDNs  as Unicode in the status bar for both cases.

 Run 2: Same as Run 1 

 Previous test Test indexNext test   Results

 Version: $Id: test-idn-display-1.html,v 1.6 2007/03/23 18:03:17 rishida Exp $

Received on Tuesday, 27 March 2007 10:34:54 UTC