Re: Error handling in URIs

Ian Hickson wrote:
> I recently started addressing issues related to URIs in the context of the 
> HTML5 specification. In general I am trying to defer as much as possible 
> to the URI, IRI, IDN, and XML Base specifications, but there are a couple 
> of issues that are left undefined by those specifications which I am 
> having trouble with.
> 
> The first is error handling behaviour for URIs. Browsers are reasonably 
> consistent in their handling of invalid URI references such as:
> 
>    http://example.com/hello world/
> 
> ...or:
> 
>    {{%%xx##
> 
> ...but the URI specification just says that these URI references are 
> invalid and doesn't really say what to do with them.

Well, the URI spec doesn't need to. It's an error.

If HTML5 spec needs to define the behavior because that's what the UAs 
do, that's fine and can be done there.

> The second is with IRIs and character encodings other than UTF-8. While 
> browsers reliably encode non-ASCII characters in the path using UTF-8, 
> non-ASCII characters in the query component are encoded using the 
> document's character encoding, and not UTF-8, which is incompatible with 
> how the IRI spec defines things.

Could you please be more specific? Any URI is a IRI, so a query 
component based on an encoding other than UTF-8 still is a legal IRI.

> ...

BR, Julian

Received on Tuesday, 24 June 2008 11:45:21 UTC