W3C home > Mailing lists > Public > www-international@w3.org > January to March 2005

Re: IDN problem.... :(

From: Frank Yung-Fong Tang <ytang0648@aol.com>
Date: Wed, 16 Feb 2005 11:04:16 -0500
To: "Adam Twardoch" <list.adam@twardoch.com>
cc: www-international@w3.org
Message-ID: <42136F00.2010207@aol.com>

Disallow "mix characters from different script" (in one URL?) I guess 
you mean in one "domainlabel".

see http://www.faqs.org/rfcs/rfc1738.html
"hostport       = host [ ":" port ]
host           = hostname | hostnumber
hostname       = *[ domainlabel "." ] toplabel
domainlabel    = alphadigit | alphadigit *[ alphadigit | "-" ] alphadigit
toplabel       = alpha | alpha *[ alphadigit | "-" ] alphadigit"

right?

The problem of this apprach is that is not good enough to prevent spoofing.

Try this

http://oss.software.ibm.com/cgi-bin/icu/idnademo?t=http%3A%2F%2Fwww.%D0%B5%D0%B0%D0%AC%D1%83.com

All Cyrillic and if you see the text in Courier font, the 
http://www.еаЬу.com just looks like http://www.ebay.com

Adam Twardoch wrote on 2/15/2005, 5:26 AM:

 >
 >
 >
 >
 >
 > I'm talking about disallowing to mix characters from different scripts in
 > one URL.
 >
 > European digits are used in Latin, Cyrillic, Greek, Kanji, Arabic and
 > other
 > scripts. Similarly, the hyphen or ampersand are not tried to one
 > particular
 > script. Using European digits with Cyrillic, with Latin or with Arabic is
 > not "mixing characters from different scripts" so there is no problem.
 >
 > Regards,
 > Adam
 >
 >
 >
 > ----- Original Message ----- From: "Najib Tounsi" <ntounsi@emi.ac.ma>
 > To: ""Adam Twardoch" (by way of Martin Duerst <duerst@w3.org>)"
 > <list.adam@twardoch.com>
 > Cc: <www-international@w3.org>
 > Sent: Monday, February 14, 2005 7:15 PM
 > Subject: Re: IDN problem.... :(
 >
 >
 > >Adam Twardoch (by way of Martin Duerst <duerst@w3.org>) wrote:
 > >
 > >>
 > >>
 > >>
 > >>
 > >>
 > >>----- Original Message ----- From: "John Hudson" <tiro@tiro.com>
 > >>
 > >>>The security issue is simply due to the fact that some characters
 > >>>typically look identical to other characters. So change the
 > appearance.
 > >>
 > >>
 > >>Nah. It's poor design of IDN. They should have disallowed mixing
 > >>characters from different scripts in one URL.
 > >
 > >01234... are sometime called arabic digits and belong also to arabic
 > >script (used in western Arabic countries). www.UNIV5.ma, where UNIV5 are
 > >in arabic, would be wrong?
 > >
 > >>It wouldn't have ruled out all of the problems, but most of them.
 > >>
 > >>A.
 > >>
 > >>
 > >>
 > >>
 > >>
 > >>
 > >
 > >--
 > >Najib TOUNSI (mailto:tounsi@w3.org)
 > >Bureau W3C au Maroc (http://www.w3c.org.ma/)
 > >Ecole Mohammadia d'Ingenieurs, BP 765 Agdal-RABAT Maroc (Morocco)
 > >Phone : +212 (0) 37 68 71 74  Fax : +212 (0) 37 77 88 53
 > >Mobile: +212 (0) 61 22 00 30
 >
 >
 >
 >
Received on Wednesday, 16 February 2005 16:05:06 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:17:04 GMT