W3C home > Mailing lists > Public > uri@w3.org > June 2008

Re: Error handling in URIs

From: Frank Ellermann <nobody@xyzzy.claranet.de>
Date: Wed, 25 Jun 2008 01:30:52 +0200
To: uri@w3.org
Message-ID: <g3s003$q56$1@ger.gmane.org>

Anne van Kesteren wrote:

> It's also transmitted as another encoding than UTF-8 
> (while the path component _is_ transmitted as UTF-8).

One of the best things with IRIs is that they are KISS:

They use one and only one charset, the document charset,
wherever they contain non-ASCII characters.

For document types permitting NCRs or similar entities
it means whatever it means in this document type, i.e.
typically Unicode points or *error* (e.g., using &uuml;
in XML without definition).

What %hh means depends on the server, it might be just
percent-encoded UTF-8 as specified in RFC 3987, or any
binary gibberish (e.g., in data: URIs), or legacy stuff
for FTP servers on top of a legacy file system.

But an iso-8859-1 "ü" in an iso-8859-1 document is an
"ü", also in *all* parts of an IRI, not only the path.

 Frank
Received on Tuesday, 24 June 2008 23:29:52 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 13 January 2011 12:15:41 GMT