- From: Martin Duerst <duerst@it.aoyama.ac.jp>
- Date: Fri, 26 Oct 2007 14:54:21 +0900
- To: John Cowan <cowan@ccil.org>, Najib Tounsi <ntounsi@emi.ac.ma>
- Cc: John Cowan <cowan@ccil.org>, Daniel Dardailler <danield@w3.org>, "'WWW International'" <www-international@w3.org>, W3C Offices <w3c-office-pr@w3.org>, public-i18n-core@w3.org
One very important property of TDLs that I haven't mentioned in my previous posts are that they are one-off creations. This has various implications. One implication is with respect to spoofing. While for second-level domains (or third-level domains, in cases such as foo.co.uk) which are open to everybody, general rules against spoofing have to be designed (e.g. don't mix scripts,...), on the TLD level, spoofing can be eliminated by considering each case in turn. The example I'm always using for this is that Russia in Cyrillic would most naturally get a two letter code looking like .py, but this would spoof Paraguay, so something different has to be found for Russia, e.g. transliterated r-ja (in actual writing what looks like a p followed by a mirrored R). Case-by-case checking in many ways is much easier than defining general rules, and this means that the spoofing scare that some people have is not at all justified (can easily be avoided) for TLDs. A second implication is that we only need to find one case that works reasonably, rather than make all cases work. As an example, consider the task of finding something equivalent to .com for the East Asian region using Han ideographs. [Let's for the moment assume that we want such a thing; I at least think that such gTLDs should be second priority to ccTLDs.] In each language/region, there are various Han characters that are used for companies/commerce/... Some may have simplified/ traditional variants. Some may only be used in some regions, in other regions being associated with something completely different. As an example, company is 公司 in Chinese, but 会社 in Japanese, so neither of these characters work. But looking around, a character such as 商 may indeed express the concept closely enough in all regions. [I have checked this for the above character, and I think it's true. I have also been told that this character is also associated with a dinasty in China, which may be an issue to consider, although I don't think reserving characters for dinasty TLDs makes any sense. If somebody knows more, I'm glad to learn.] Using the same script usually means that there is also some sharing of culture or vocabulary. It is well known that e.g. for the Arabic script, there are quite some words of Arabic (language) origin that are used in languages that are linguistically not related to Arabic. Same for most other scripts and languages. It is clear that it may be difficult to find examples that reach into each and every language written with the script, but it's also clear that with a bit of work and deliberation, it should be possible to cover a high percentage of users, not just an accidental majority. In addition, abbreviations often help. They definitely helped in the cases of .com or .net,... Such abbreviations can be seen as mere letter combinations, potentially gaining a meaning of their own. This is more difficult with longer words. In conclusion: Try to find cases that work. If we find something that works, we are done. We don't need to use the things that don't work. I think that's the main point where I'm very unhappy with ICANN, and also with some of what Daniel has said. Rather than to spend the main energy on finding thing that work, it at least looks like most energy is spent on trying to find counterexamples and problems that, if avoided, are not relevant at all. What's also frustrating is that the counterarguments mostly seem to be comming from people who have little day-in-day-out experience with non-Latin scripts, even if they may know a lot about many scripts and languages. What most of these people don't realize is that learning another script, e.g. the 26 letters of the Latin alphabet, is not equivalent to being truely fluent in that script, which independently of the script takes years. Indeed, in Europe, every second-grader (and these days indeed most first-graders) know the letters of the alphabet, but this in no way means they are fluent in the Latin script. As another example, I have no problems reading Japanese newspapers or technical publications, and I teach in Japanese at a Japanese University using my own Japanese materials. Still, after a total of more than 10 years in Japan, my speed of reading Japanese is quite a bit lower than that of reading English or German, and the speed of finding a word on a page or a topic in a book is even more different. Regards, Martin. At 11:43 07/10/26, John Cowan wrote: >Najib Tounsi scripsit: > >> Or the 23 countries and territories with a combined population of some >> 325 million of users? > >Sure, Arabic is the largest language written in Arabic script, but >Arabic script is used in many countries where Arabic is not spoken, >and Urdu plus the various kinds of Persian probably account for >half as many speakers. > >If you don't like that example, consider Cyrillic. Should all >the ccTLDs in Cyrillic script be Russian-based? At least >some of the Latin ccTLDs aren't English-based, even though >English is far and away the largest Latin-script language. > >-- >John Cowan cowan@ccil.org http://ccil.org/~cowan >"The exception proves the rule." Dimbulbs think: "Your counterexample proves >my theory." Latin students think "'Probat' means 'tests': the exception puts >the rule to the proof." But legal historians know it means "Evidence for an >exception is evidence of the existence of a rule in cases not excepted from." #-#-# Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University #-#-# http://www.sw.it.aoyama.ac.jp mailto:duerst@it.aoyama.ac.jp
Received on Friday, 26 October 2007 08:40:56 UTC