- From: Frank Ellermann <nobody@xyzzy.claranet.de>
- Date: Wed, 04 Apr 2007 20:57:37 +0200
- To: www-international@w3.org
Richard Ishida wrote: > See the latest version at > http://www.w3.org/International/articles/idn-and-iri/#phishing Hi, I think the terminology in this article is very unclear: By definition an URI follows the syntax specified in STD 66, that's a proper subset of ASCII characters. If an UA decides to display the URI as IRI it's already using some assumptions, e.g. treating percent encoded octets as UTF-8 where that makes sense, or using a "ToUnicode" version of what appears to be IDNA labels in a domain. The only place where the latter is supposed to work is the host part of (most) URI schemes, ignoring "alternate roots" making up their own rules, or other forms of registered names not belonging to the normal DNS. There's no such thing as a valid URI using any raw "non-ASCII" octets, Latin-1, UTF-8, UTF-16, or EBCDIC alike. If there's no validator capable of checking the URI syntax as specified in STD 66 it's harmful to publish invalid pages like <http://www.w3.org/International/tests/sec-iri-3> Frank
Received on Wednesday, 4 April 2007 18:58:58 UTC