W3C home > Mailing lists > Public > www-international@w3.org > July to September 2008

Re: URIs and i18n

From: Martin Duerst <duerst@it.aoyama.ac.jp>
Date: Wed, 30 Jul 2008 12:49:44 +0900
Message-Id: <6.0.0.20.2.20080730115923.0816b558@localhost>
To: "Ram Mohan" <rmohan@afilias.info>, "Richard Ishida" <ishida@w3.org>, <www-international@w3.org>
Cc: "'Jean-Guilhem Rouel'" <jean-gui@w3.org>

At 00:50 08/07/30, Ram Mohan wrote:

>In the case of domain name registrations, please keep in mind that the native script registration is stored inside domain name registry systems as ASCII, and is shown in a user-unfriendly format with a prefix of <xn-->.

"is shown" depends on what software you use. It's true for IE 6
without a plugin, for example, but newer browsers don't do that
anymore.


>Registering two domain names, one in the native script and another in ascii only is a useful trick; this will become an even more useful method when domain names transform from <native script label>.<ascii domain> to <native script label>.<native script domain> (otherwise called IDN TLDs).
>
>Use of localized labels in email poses greater challenges, since automatic downgrades to ascii are required when the original string is sent in a local script.  That's a discussion for another day.

The IETF is working on a set of documents describing how this is
going to work. Some main ones have been approved just recently
by the IESG. This is currently of experimental status, but is
expected to change. Also, this week at the IETF meeting in Dublin,
various implementers are making interoperability tests and
demonstrations.

>One of the biggest concerns is with script mixing, where ASCII and several local scripts get intermingled in an IRI.  In my opinion, this is quite a bad thing, leads to a great deal of user confusion and potential for phishing - it's one of the biggest things that should be explicitly restricted (a few languages exist where script mixing is required, but these are finite and definable as exceptions).

I understand the dangers of mixing several scripts (not necessarily
including ASCII, which isn't a script anyway) in the same IRI
component (e.g. label for a domain name). Using different components
with different scripts isn't really much of a problem, except for
very rare and special cases (e.g. a Cyrillic component that looks
exactly like a Latin one in an otherwise Latin IRI).

Regards,    Martin.


#-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-#  http://www.sw.it.aoyama.ac.jp       mailto:duerst@it.aoyama.ac.jp     
Received on Wednesday, 30 July 2008 06:28:32 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:17:18 GMT