- From: Ian Hickson <ian@hixie.ch>
- Date: Tue, 24 Jun 2008 10:09:53 +0000 (UTC)
- To: uri@w3.org
Hi, I recently started addressing issues related to URIs in the context of the HTML5 specification. In general I am trying to defer as much as possible to the URI, IRI, IDN, and XML Base specifications, but there are a couple of issues that are left undefined by those specifications which I am having trouble with. The first is error handling behaviour for URIs. Browsers are reasonably consistent in their handling of invalid URI references such as: http://example.com/hello world/ ...or: {{%%xx## ...but the URI specification just says that these URI references are invalid and doesn't really say what to do with them. The second is with IRIs and character encodings other than UTF-8. While browsers reliably encode non-ASCII characters in the path using UTF-8, non-ASCII characters in the query component are encoded using the document's character encoding, and not UTF-8, which is incompatible with how the IRI spec defines things. Is there any chance that the URI and IRI specifications might get updated to handle these issues? At the moment, I'm working around these issues by "wrapping" the URI specs with pre- and post- processing steps and by requiring that implementations use slightly different definitions for the ABNF productions, which is rather dubious. You can see this work in progress here: http://www.whatwg.org/specs/web-apps/current-work/#urls (It's woefully incomplete.) It would be much cleaner if instead HTML5 could just defer to the URI specs for everything URI-related. Cheers, -- Ian Hickson U+1047E )\._.,--....,'``. fL http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Received on Tuesday, 24 June 2008 10:10:30 UTC