Re: Error handling in URIs

Ian Hickson wrote:
> ...
>> You could change the algorithm how to get to the IRI in the first place, such
>> as making it equivalent to:
>>
>>  <a href="results.cgi/&#x017d;?&#xde;">
>>
>> ...in which case the standard IRI->URI conversion would yield the expected
>> result.
> 
> I'm not really sure what that would look like, compared to what I have 
> now. Could you elaborate?

1. Consider the input an IRI

2. Convert non-ASCII characters in the query part to URI characters by 
encoding them in the document characters set, then percent-escaping

3. Go on with regular IRI->URI conversion.

Of course that's almost the same as re-doing all the work done in the 
IRI spec, but at least you wouldn't need to worry about IDN stuff.

>>> IE actually sends http://example.com/results.cgi/%C5%BD?* where "*" is the
>>> ISO-8859-13-encoded 8-bit byte for that character. If you target an 
>> Now that suggests to me that there is no interop between IE and Safari, and
>> thus whatever you specify *may* break something.
> 
> The situation is far from perfect, indeed. That's why we need specs that 
> define error handling, to avoid this mess where Web content relies on 
> unspecified issues and forces interoperability through 
> reverse-engineering. (In this particular case, the differences between IE 
> and the other browsers don't matter much because sites tend to only use 
> one encoding, so the encoding source doesn't matter, and tend to convert 
> %-escaped bits into their equivalent 8 bit octets before processing them, 
> so they see the 8-bit URIs and the %-escaped URIs as equivalent.)

As long as no intermediate re-encodes the resource.

>>>> Now, that being said, is there anything HTML5 could do so we can get 
>>>> closer to a strict UTF-8 world in the future? Such as allowing 
>>>> servers to serve document in an encoding != UTF-8, but still get 
>>>> query parameters to be consistently encoded in UTF-8?
>>> There might be, but I don't see any way to get there at the moment. 
>>> Any suggestions would be very welcome.
>> A form attribute through which the site can state: "I want 
>> UTF-8-encoding-then-percent-escaping, no matter what the document 
>> encoding was"?
> 
> We have that already. It doesn't really help regular links.

Regular links aren't a problem (if I understand "regular" correctly), 
because the site owner generated them.

>> Or potentially, in a more distant future, some way of specifying URI 
>> templates (*)?
>>
>> (*) Yes, when they are ready...
> 
> Maybe.

BR, Julian

Received on Tuesday, 24 June 2008 21:00:19 UTC