RE: IDN problem.... :(

Erik,

We are not "catering" to anybody, we just recognize that there is no good
way to create a table for a language that has gone from Arabic to Latin to
Cyrillic and back to Latin in about 100 years.  It is a stretch comparison,
but look at the rules for mapping Simplified Chinese and Traditional
Chinese.  I do sympathize, that is why we monitor script mixing and actual
registrations that contain multiple scripts.  What I would like to see is a
language table for all languages that we permit from a tag standpoint.


Pat

-----Original Message-----
From: Erik van der Poel [mailto:erik@vanderpoel.org] 
Sent: Wednesday, February 16, 2005 2:29 AM
To: Kane, Pat
Cc: www-international@w3.org
Subject: Re: IDN problem.... :(

Hello Pat,

Thank you for chiming in. This is exactly the kind of info that I need,
to understand the problem a bit better. I think it's great that you seem
to want to cater to these former Soviet Union nations with special
needs, but I wonder, do you also sympathize with those in other parts of
the world that might be duped by spoofed characters? And, if so, do you
have a proposed solution?

I'm beginning to think that maybe the only way out of this mess is to
fold the homographs into single codes (as has been proposed by several
people). I.e. you may start with Latin small letter 'a' and Cyrillic
small letter 'a' (with distinct codes), but after nameprepping they
would no longer be distinct and would have the same code (probably the
code for Latin small letter 'a' in order to be compatible with legacy
DNS names).

Of course, this wouldn't be Nameprep as it is currently defined. It
would be a different prep, e.g. BetterPrep.

And of course, it would be really difficult to come up with a spec for
this betterprep that satisfies everyone. A pair of glyphs that look
similar to one person might look different to another.

But perhaps a good starting point is the "cmap" of a popular font, say,
Arial. If Arial's cmap has the same glyph index for a pair of
characters, then we enter this pair into the betterprep spec draft. Then
we look at the other glyphs and decide whether or not they ought to be
considered homographs (homoglyphs? :-)

At the end of this laborious and controversial process, we have the next
Maturity Level of the IDN RFCs, i.e. Draft Standard, and they are given
a new prefix (i.e. something other than xn--). At this point, the
registries go through their existing xn-- names, decode them, run them
through betterprep, and resolve any conflicts in the same way that
trademark grievances are addressed.

After that, we have one final iteration for, you guessed it, the
Standard Maturity Level, with yet another prefix and a prep called
BestPrep, the final version. It may be that BestPrep is almost the same
as BetterPrep, or even identical, in which case we don't need a new prefix.

Of course, we will have to give the applications some time to prepare
for the new prefixes, so there would be a certain amount of time between
publication of the RFCs and recoding of the registries.

Thoughts?

Erik

Kane, Pat (by way of Martin Duerst <duerst@w3.org>) wrote:
> 
> Commingling of scripts certainly is the issue here but they must be 
> permitted for certain communities as their languages utilize multiple 
> characters from multiple scripts.  During the development of ICANN痴 IDN 
> guidelines I presented the details about the script mixing within com 
> and net.  There were very few issues around the mixing of Latin and 
> other scripts with the exception of Cyrillic and Greek.  However, there 
> are several former Soviet Union nations that originally used Latin 
> characters then converted to Cyrillic characters and who are now 
> returning to Latin that need just this commingling.  Yes, there are very 
> few registrations that come from Tajikistan, but these are the types of 
> communities that IDNs were developed for.

Received on Wednesday, 16 February 2005 13:27:21 UTC