- From: Elliotte Harold <elharo@metalab.unc.edu>
- Date: Fri, 27 Jun 2008 07:47:00 -0700
- To: Ian Hickson <ian@hixie.ch>
- Cc: uri@w3.org
Ian Hickson wrote: > ...but the URI specification just says that these URI references are > invalid and doesn't really say what to do with them. > That's because the URI spec doesn't say what to do with any URIs, valid or invalid. That's left up to the application spec. > The second is with IRIs and character encodings other than UTF-8. While > browsers reliably encode non-ASCII characters in the path using UTF-8, > non-ASCII characters in the query component are encoded using the > document's character encoding, and not UTF-8, which is incompatible with > how the IRI spec defines things. You mean, for instance, when submitting a form using GET? Interesting. If so that's a flat-out browser bug and should be fixed. > Is there any chance that the URI and IRI specifications might get updated > to handle these issues? > I certainly hope not. This would be a major disaster. Remember, it's not just browsers that matter here. It's all the software receiving this content from browsers. The URI space is not rich enough to include an encoding declaration. There is no way for the target of a URI to tell what encoding the client used unless we can agree on one single, uniform answer. That answer is UTF-8. I don't think that's the encoding I would have preferred (UTF-16 is somewhat simpler in this use case) but it's certainly adequate and there's no reason to change it now. Allowing a multiplicity of encodings in URLs is a recipe for interoperability disaster. -- Elliotte Rusty Harold elharo@metalab.unc.edu Refactoring HTML Just Published! http://www.amazon.com/exec/obidos/ISBN=0321503635/ref=nosim/cafeaulaitA
Received on Friday, 27 June 2008 14:47:42 UTC