W3C home > Mailing lists > Public > www-international@w3.org > April to June 2007

Considered harmful: An Introduction to Multilingual Web Addresses

From: Frank Ellermann <nobody@xyzzy.claranet.de>
Date: Wed, 04 Apr 2007 20:57:37 +0200
To: www-international@w3.org
Message-ID: <4613F521.237C@xyzzy.claranet.de>

Richard Ishida wrote:
 
> See the latest version at
> http://www.w3.org/International/articles/idn-and-iri/#phishing

Hi, I think the terminology in this article is very unclear:  By
definition an URI follows the syntax specified in STD 66, that's
a proper subset of ASCII characters.

If an UA decides to display the URI as IRI it's already using
some assumptions, e.g. treating percent encoded octets as UTF-8
where that makes sense, or using a "ToUnicode" version of what
appears to be IDNA labels in a domain.  The only place where
the latter is supposed to work is the host part of (most) URI
schemes, ignoring "alternate roots" making up their own rules,
or other forms of registered names not belonging to the normal
DNS.

There's no such thing as a valid URI using any raw "non-ASCII"
octets, Latin-1, UTF-8, UTF-16, or EBCDIC alike.

If there's no validator capable of checking the URI syntax as
specified in STD 66 it's harmful to publish invalid pages like
<http://www.w3.org/International/tests/sec-iri-3>

Frank
Received on Wednesday, 4 April 2007 18:58:58 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:17:12 GMT