Re: Error handling in URIs

Ian Hickson wrote:

> <!DOCTYPE HTML>
> <title>Test</title>
> <meta charset="ISO-8859-13">
> <a href="results.cgi/&#x017d;?&#x017d;">Link</a>
 
> ...what is the link?

It is whatever the unspecified "HTML" document type
definition says.  In the case of a specified HTML-
or XHTML-document type hex. NCRs represent Unicode
points since at least RFC 2070 and/or HTML 4, see
<http://purl.net/net/ucode/017d>

So this is an IRI, no URI, and invalid in document
types permitting only URIs.  That the HTML 4 spec.
is ate best fuzzy about this is one of the reasons
why you want HTML5, isn't it ?

> Safari, for instance, will fetch the following URI
> (assuming the base URL is http://example.com/):

>    http://example.com/results.cgi/%C5%BD?%DE

Trying to be smart... :-(  If it can deal with UTF-8,
and obviously that is the case, it should better try
%C5%BD also as query.  Otherwise the server has no
good chance to figure what is going on.  

> we have to define the processing that led to two 
> characters in the same URL being encoded using two
> different character encodings.

Just don't, take RFC 3987 "as is".  It is technically
not possible to define something else without running
into logical problems like your Safari + IE examples.

> Any suggestions would be very welcome.

Stay out of trouble.  There is a list about IRIs, and
if folks want to update RFC 3987 in wild and wonderful
ways they can try their luck on this list.  Follow all
standards down to their damned last comma, or update
them.  Any attempt to "redefine" standards elsewhere,
e.g. directly in HTML5, is doomed.

 Frank

Received on Wednesday, 25 June 2008 00:28:18 UTC