W3C home > Mailing lists > Public > uri@w3.org > June 2008

Re: Error handling in URIs

From: Elliotte Harold <elharo@metalab.unc.edu>
Date: Fri, 27 Jun 2008 07:47:00 -0700
Message-ID: <4864FD64.5080007@metalab.unc.edu>
To: Ian Hickson <ian@hixie.ch>
Cc: uri@w3.org

Ian Hickson wrote:

> ...but the URI specification just says that these URI references are 
> invalid and doesn't really say what to do with them.

That's because the URI spec doesn't say what to do with any URIs, valid 
or invalid. That's left up to the application spec.

> The second is with IRIs and character encodings other than UTF-8. While 
> browsers reliably encode non-ASCII characters in the path using UTF-8, 
> non-ASCII characters in the query component are encoded using the 
> document's character encoding, and not UTF-8, which is incompatible with 
> how the IRI spec defines things.

You mean, for instance, when submitting a form using GET? Interesting. 
If so that's a flat-out browser bug and should be fixed.

> Is there any chance that the URI and IRI specifications might get updated 
> to handle these issues?

I certainly hope not. This would be a major disaster. Remember, it's not 
just browsers that matter here. It's all the software receiving this 
content from browsers.

The URI space is not rich enough to include an encoding declaration. 
There is no way for the target of a URI to tell what encoding the client 
used unless we can agree on one single, uniform answer. That answer is 
UTF-8. I don't think that's the encoding I would have preferred (UTF-16 
is somewhat simpler in this use case) but it's certainly adequate and 
there's no reason to change it now.

Allowing a multiplicity of encodings in URLs is a recipe for 
interoperability disaster.

Elliotte Rusty Harold  elharo@metalab.unc.edu
Refactoring HTML Just Published!
Received on Friday, 27 June 2008 14:47:42 UTC

This archive was generated by hypermail 2.4.0 : Sunday, 10 October 2021 22:17:51 UTC