Re: Error handling in URIs

Anne van Kesteren wrote:

> It's also transmitted as another encoding than UTF-8 
> (while the path component _is_ transmitted as UTF-8).

One of the best things with IRIs is that they are KISS:

They use one and only one charset, the document charset,
wherever they contain non-ASCII characters.

For document types permitting NCRs or similar entities
it means whatever it means in this document type, i.e.
typically Unicode points or *error* (e.g., using ü
in XML without definition).

What %hh means depends on the server, it might be just
percent-encoded UTF-8 as specified in RFC 3987, or any
binary gibberish (e.g., in data: URIs), or legacy stuff
for FTP servers on top of a legacy file system.

But an iso-8859-1 "ü" in an iso-8859-1 document is an
"ü", also in *all* parts of an IRI, not only the path.

 Frank

Received on Tuesday, 24 June 2008 23:29:52 UTC