- From: Kane, Pat <pkane@verisign.com>
- Date: Wed, 16 Feb 2005 08:27:12 -0500
- To: "'Erik van der Poel'" <erik@vanderpoel.org>
- Cc: www-international@w3.org
Erik, We are not "catering" to anybody, we just recognize that there is no good way to create a table for a language that has gone from Arabic to Latin to Cyrillic and back to Latin in about 100 years. It is a stretch comparison, but look at the rules for mapping Simplified Chinese and Traditional Chinese. I do sympathize, that is why we monitor script mixing and actual registrations that contain multiple scripts. What I would like to see is a language table for all languages that we permit from a tag standpoint. Pat -----Original Message----- From: Erik van der Poel [mailto:erik@vanderpoel.org] Sent: Wednesday, February 16, 2005 2:29 AM To: Kane, Pat Cc: www-international@w3.org Subject: Re: IDN problem.... :( Hello Pat, Thank you for chiming in. This is exactly the kind of info that I need, to understand the problem a bit better. I think it's great that you seem to want to cater to these former Soviet Union nations with special needs, but I wonder, do you also sympathize with those in other parts of the world that might be duped by spoofed characters? And, if so, do you have a proposed solution? I'm beginning to think that maybe the only way out of this mess is to fold the homographs into single codes (as has been proposed by several people). I.e. you may start with Latin small letter 'a' and Cyrillic small letter 'a' (with distinct codes), but after nameprepping they would no longer be distinct and would have the same code (probably the code for Latin small letter 'a' in order to be compatible with legacy DNS names). Of course, this wouldn't be Nameprep as it is currently defined. It would be a different prep, e.g. BetterPrep. And of course, it would be really difficult to come up with a spec for this betterprep that satisfies everyone. A pair of glyphs that look similar to one person might look different to another. But perhaps a good starting point is the "cmap" of a popular font, say, Arial. If Arial's cmap has the same glyph index for a pair of characters, then we enter this pair into the betterprep spec draft. Then we look at the other glyphs and decide whether or not they ought to be considered homographs (homoglyphs? :-) At the end of this laborious and controversial process, we have the next Maturity Level of the IDN RFCs, i.e. Draft Standard, and they are given a new prefix (i.e. something other than xn--). At this point, the registries go through their existing xn-- names, decode them, run them through betterprep, and resolve any conflicts in the same way that trademark grievances are addressed. After that, we have one final iteration for, you guessed it, the Standard Maturity Level, with yet another prefix and a prep called BestPrep, the final version. It may be that BestPrep is almost the same as BetterPrep, or even identical, in which case we don't need a new prefix. Of course, we will have to give the applications some time to prepare for the new prefixes, so there would be a certain amount of time between publication of the RFCs and recoding of the registries. Thoughts? Erik Kane, Pat (by way of Martin Duerst <duerst@w3.org>) wrote: > > Commingling of scripts certainly is the issue here but they must be > permitted for certain communities as their languages utilize multiple > characters from multiple scripts. During the development of ICANN痴 IDN > guidelines I presented the details about the script mixing within com > and net. There were very few issues around the mixing of Latin and > other scripts with the exception of Cyrillic and Greek. However, there > are several former Soviet Union nations that originally used Latin > characters then converted to Cyrillic characters and who are now > returning to Latin that need just this commingling. Yes, there are very > few registrations that come from Tajikistan, but these are the types of > communities that IDNs were developed for.
Received on Wednesday, 16 February 2005 13:27:21 UTC