W3C home > Mailing lists > Public > www-international@w3.org > October to December 2007

Re: Internationalized Domain Names (IDNs) in progress

From: Martin Duerst <duerst@it.aoyama.ac.jp>
Date: Fri, 26 Oct 2007 15:21:18 +0900
Message-Id: <6.0.0.20.2.20071026145844.07029bb0@localhost>
To: Najib Tounsi <ntounsi@emi.ac.ma>
Cc: Daniel Dardailler <danield@w3.org>, "'WWW International'" <www-international@w3.org>, W3C Offices <w3c-office-pr@w3.org>, public-i18n-core@w3.org

At 08:22 07/10/26, Najib Tounsi wrote:
>Hello Martin,
>
>Few more comments.
>
>Martin Duerst wrote:
>> ... Scaling issues are definitely difficult to predict, but for that,
>>〓 most kinds of testing doesn't help. And adding IDN TLDs shouldn't be
>> done in a way that suddenly increases the number of TLDs by an order
>> of magnitude. Adding just the IDN ccTLDs in what I call the first
>> stage below will probably less than double the overall number of
>> ccTLDs.
>
>+1. for this by-steps approach.
>
>> ... So I think these ccTLDs should be staged, first creating those
>> for scripts that are widely used in a particular country (as
>> criteria, things such as "does that script appear on the country's
>> coins or banknotes", "is that script used in official publications"
>> and so on can be used). The second stage may include minorities (e.g.
>> Arabic for France, Punjabi for the UK, maybe Tamil for Switzerland
>> and so on). The third stage would address tourists, and the fourth
>> stage, if ever, could try to reach full coverage.
>
>If I understand you well, for France as an example, TLDs would be a small set like: {.fr (of course), .FRANSA (France name written in Arabic script), .AlphaBetaGamma (in Greek script) ...}, all equivalent to the original .fr.

In the first stage, only .fr would be needed. In the second stage,
an Arabic script equivalent could be added. I don't know how many
Greek there are in France, so I don't know which stage to put
Greek in.

As for 'equivalent', I think it's up to the internet community and
authorities in each country to decide exactly how to handle that.
One solution is to register and serve exactly the same second-level
domains in all country ccTLDs, another is to strictly separate them
by script, and so on.

Also, I would clearly advise for the Arabic equivalent also be
a short code, rather than a full word.


>And for Morocco {.ma (French is de-facto 2nd official language), .ALMAGHREB (Morocco name written in Arabic script), .XYZ (in Tifinagh), .BetaGammaAlpha (for Greek) ...}.

Yes. There is no need for a justification for .ma, it already exists,
and of course shouldn't disappear. Arabic clearly belongs in the
first stage. Again, I think a two-letter abbreviation would be
better. There is already a standard (ASMO?) for two-letter country
codes for Arabic covering the countries using the Arabic script.
Tifinagh probably also belongs into the first stage, or maybe
the second. The issue here is to not be too strict about the stages,
it's more a question of how quickly reasonable proposals can be
made. It may be easier for the few contries where Tifinagh is
actually used to come up with a proposal for ccTLDs for each,
whereas Arabic may take more time, or it may be that Arabic
can use the abovementioned standard and be faster.
As for Greek, I have no idea how many people actually living in
Morocco use Greek, my guess is that this would be third or fourth
stage.


>Me, as an Arabic speaker, I'll be "happy" to type .FRANCA in Arabic for my preferred YAHOO.FRANCA site. Luckily, my keyboard is bilingual.
>But note that there is a sub-problem here. May be Morocco will have the advantage :-) to gain the full name〓 (vs. a short code) .ALMAGHREB as it's brand new TLD, where as France will keep their .fr , and might ask for the same〓 advantage to create a .france TLD.

That's one of the reasons (but not the only one) why I
advocate to use short codes (mostly two-letter, in the case
of Han ideographs, one letter would be enough, and probably
also one syllable for Hangul and potentially for Ethiopic).


>> ... Please note that above, we are always speaking about script, not
>> language.
>
>BTW, I've noted that ICANN actually tests Arabic + Persian TLDs. Both are languages based on the same Arabic script. One test is redundant?

I don't think these are redundant. First, please note that somebody
was careful enough to make sure that the top-level domain actually
is the same in both cases! Second, I think one of the ideas is to
use a Wiki at these domains, in which case the language starts to
become important. Third, it's much better to do tests in parallel
than to later have somebody claim that there haven't been any tests
in Persian yet,...


>> ...
>> I have read John's various documents on this topic, have listened to
>> talks from him, and have discussed things directly with him. He
>> raises a few good points, but many times makes an elephant out of a
>> mouse, or lets readers things about elephants when they should think
>> about mice.
>
>I didn't know about this author before, and I just read some of his writings about IDNs. If I understand his approach, he suggests to deal with IDNs at the level of user's Interface. The idea is to map TLDs locally, that is, if a TLD is in local characters, it is translated to the standard ASCII form, using a translation table where names in local form are kept with their corresponding〓 ascii form .fr, .uk, .com etc.

This is one of the proposals that John Klensin has made, but by far
not the only one. I have made a similar proposal for scheme names,
so I can't claim that his proposal is totally without merrit, but
I think it's not appropriate for TLDs.


>I remember in a recent mail by Sarmad Hussain (see the thread <http://lists.w3.org/Archives/Public/public-i18n-core/2007AprJun/0181.html>http://lists.w3.org/Archives/Public/public-i18n-core/2007AprJun/0181.html) that this approach is also followed locally for Urdu language. From the above thread, it appears that this not recommended. There is at least an interoperability problem. You type your native IDN in your computer, your local browser can map it because it has been set locally to do so (e.g. by a plug-ins). You copy-paste and send me this same IDN. My browser can't recognize it.

Yes. If you make it an interface issue, you have to keep it strictly
locally, which is difficult.


>I completely agree with this kind of opinion, optimistic in nature.
>However, in I18N, it is not always symetric to talk about countries. With globalization, I don't know if the "99.9% vs.〓 0.1 %" rule will still hold. It is well established that in developing countries, to speak an occidental (and thus a Latin) language is an asset. But this is another question.

Of course we don't want to prohibit the use of Latin script.
And speaking more than one language is definitely an asset
(actually, in many, many countries and regions around the
world, speaking two or more languages, whether two local
languages or a local language and an 'occidental' language,...,
is the norm rather than the exception).

Regards,    Martin.








#-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-#  http://www.sw.it.aoyama.ac.jp       mailto:duerst@it.aoyama.ac.jp     
Received on Friday, 26 October 2007 08:41:03 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:17:15 GMT