W3C home > Mailing lists > Public > www-international@w3.org > October to December 2007

Re: Internationalized Domain Names (IDNs) in progress

From: Najib Tounsi <ntounsi@emi.ac.ma>
Date: Thu, 25 Oct 2007 23:22:07 +0000
Message-ID: <4721251F.3030109@emi.ac.ma>
To: Martin Duerst <duerst@it.aoyama.ac.jp>
CC: Daniel Dardailler <danield@w3.org>, 'WWW International' <www-international@w3.org>, W3C Offices <w3c-office-pr@w3.org>, public-i18n-core@w3.org
Hello Martin,

Few more comments.

Martin Duerst wrote:
>  ... Scaling issues are definitely difficult to predict, but for that,
>  most kinds of testing doesn't help. And adding IDN TLDs shouldn't be
>  done in a way that suddenly increases the number of TLDs by an order
>  of magnitude. Adding just the IDN ccTLDs in what I call the first
>  stage below will probably less than double the overall number of
>  ccTLDs.

+1. for this by-steps approach.

>  ... So I think these ccTLDs should be staged, first creating those
>  for scripts that are widely used in a particular country (as
>  criteria, things such as "does that script appear on the country's
>  coins or banknotes", "is that script used in official publications"
>  and so on can be used). The second stage may include minorities (e.g.
>  Arabic for France, Punjabi for the UK, maybe Tamil for Switzerland
>  and so on). The third stage would address tourists, and the fourth
>  stage, if ever, could try to reach full coverage.

If I understand you well, for France as an example, TLDs would be a 
small set like: {.fr (of course), .FRANSA (France name written in Arabic 
script), .AlphaBetaGamma (in Greek script) ...}, all equivalent to the 
original .fr .
And for Morocco {.ma (French is de-facto 2nd official language), 
.ALMAGHREB (Morocco name written in Arabic script), .XYZ (in Tifinagh), 
.BetaGammaAlpha (for Greek) ...}.  Me, as an Arabic speaker, I'll be 
"happy" to type .FRANCA in Arabic for my preferred YAHOO.FRANCA site. 
Luckily, my keyboard is bilingual.
But note that there is a sub-problem here. May be Morocco will have the 
advantage :-) to gain the full name  (vs. a short code) .ALMAGHREB as 
it's brand new TLD, where as France will keep their .fr , and might ask 
for the same  advantage to create a .france TLD.

>  ... Please note that above, we are always speaking about script, not
>  language.

BTW, I've noted that ICANN actually tests Arabic + Persian TLDs. Both 
are languages based on the same Arabic script. One test is redundant?

>  ...
>  I have read John's various documents on this topic, have listened to
>  talks from him, and have discussed things directly with him. He
>  raises a few good points, but many times makes an elephant out of a
>  mouse, or lets readers things about elephants when they should think
>  about mice.

I didn't know about this author before, and I just read some of his 
writings about IDNs. If I understand his approach, he suggests to deal 
with IDNs at the level of user's Interface. The idea is to map TLDs 
locally, that is, if a TLD is in local characters, it is translated to 
the standard ASCII form, using a translation table where names in local 
form are kept with their corresponding  ascii form .fr, .uk, .com etc.
I remember in a recent mail by Sarmad Hussain (see the thread 
http://lists.w3.org/Archives/Public/public-i18n-core/2007AprJun/0181.html) 
that this approach is also followed locally for Urdu language. From the 
above thread, it appears that this not recommended. There is at least an 
interoperability problem. You type your native IDN in your computer, 
your local browser can map it because it has been set locally to do so 
(e.g. by a plug-ins). You copy-paste and send me this same IDN. My 
browser can't recognize it.

>  ... Well, licence plates are a good example actually. In Japan, they
>  use Kanji and Hiragana. In Germany, they use Umlauts. In many Arabic
>  countries, they use Arabic letters and numerals. Najib can tell us
>  what Marocco does.

Well, years ago, there were numbers like "1363-1 4" as showed in 
http://www.worldlicenseplates.com/
Now, Arabic letters are used, like " 12345-X 9", where X is actually the 
Arabic Beh 'ب', coming after Alef 'أ'.
For the story, this didn't make everybody happy. At vehicle 
administration and insurance companies, computer applications are not 
yet ready to accept non Latin scripts. I've been told that, as a 
temporary solution, Alef is replaced by A and Beh by B.


>  ...  There is also nothing prohibiting somebody from serving a domain
>  both with one (or several) IDNs as well as with a US-ASCII-only
>  domain name.
>

I guess that this will be a very used practice.

>
>  See above. The usability benefits for the local population, including
>  the local police, come first. If 99.9% of the potential users can
>  remember or note down a car licence number faster because it uses the
>  native script, but 0.1% of potential users don't manage to remember
>  it or note it down, then that's a net average gain.

I completely agree with this kind of opinion, optimistic in nature.
However, in I18N, it is not always symetric to talk about countries. 
With globalization, I don't know if the "99.9% vs.  0.1 %" rule will 
still hold. It is well established that in developing countries, to 
speak an occidental (and thus a Latin) language is an asset. But this is 
another question.


Regards,

Najib
Received on Thursday, 25 October 2007 23:22:36 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:17:15 GMT