- From: Ian Hickson <ian@hixie.ch>
- Date: Tue, 24 Jun 2008 10:09:53 +0000 (UTC)
- To: uri@w3.org
Hi,
I recently started addressing issues related to URIs in the context of the
HTML5 specification. In general I am trying to defer as much as possible
to the URI, IRI, IDN, and XML Base specifications, but there are a couple
of issues that are left undefined by those specifications which I am
having trouble with.
The first is error handling behaviour for URIs. Browsers are reasonably
consistent in their handling of invalid URI references such as:
http://example.com/hello world/
...or:
{{%%xx##
...but the URI specification just says that these URI references are
invalid and doesn't really say what to do with them.
The second is with IRIs and character encodings other than UTF-8. While
browsers reliably encode non-ASCII characters in the path using UTF-8,
non-ASCII characters in the query component are encoded using the
document's character encoding, and not UTF-8, which is incompatible with
how the IRI spec defines things.
Is there any chance that the URI and IRI specifications might get updated
to handle these issues?
At the moment, I'm working around these issues by "wrapping" the URI specs
with pre- and post- processing steps and by requiring that implementations
use slightly different definitions for the ABNF productions, which is
rather dubious. You can see this work in progress here:
http://www.whatwg.org/specs/web-apps/current-work/#urls
(It's woefully incomplete.) It would be much cleaner if instead HTML5
could just defer to the URI specs for everything URI-related.
Cheers,
--
Ian Hickson U+1047E )\._.,--....,'``. fL
http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,.
Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Received on Tuesday, 24 June 2008 10:10:30 UTC