W3C home > Mailing lists > Public > uri@w3.org > June 2008

Re: Error handling in URIs

From: Charles Lindsey <chl@clerew.man.ac.uk>
Date: Wed, 25 Jun 2008 11:00:02 +0100
To: URI <uri@w3.org>
Message-ID: <op.udat2cwe6hl8nm@clerew.man.ac.uk>

On Wed, 25 Jun 2008 03:11:25 +0100, Ian Hickson <ian@hixie.ch> wrote:

> On Wed, 25 Jun 2008, Frank Ellermann wrote:
>> Ian Hickson wrote:
>>
>> > <!DOCTYPE HTML>
>> > <title>Test</title>
>> > <meta charset="ISO-8859-13">
>> > <a href="results.cgi/&#x017d;?&#x017d;">Link</a>
>>
>> > ...what is the link?
>>
>> It is whatever the unspecified "HTML" document type definition says.
>
> Ok.
>
>
>> So this is an IRI, no URI, and invalid in document types permitting only
>> URIs.
>
> Well there's no question that it's invalid, the question is what should
> browsers do with it.

Essentially, it is up to the browser what it accepts. Normally, one  
expects IRIs/URIs published by or on behalf of the browser to be a form  
which that browser understands. It is only queries, which are likely to be  
composed by unsuspecting clients, that are the real problem.

In an ideal world, all browsers would publish their pages in UTF-8, and  
the question would then never arise; and maybe it will be like that one  
day.

But in the meantime, a sensible strategy for a browser whose pages were  
published in iso-8859-99 (whatever that might be) to accept IRIs/URIs (and  
especially queries) %-encoded into iso-8859-99; but also, *in addition* to  
convert incoming UTF-8 (whether in IRIs or %-encoded in URIs) to its own  
iso-8859-99.

That, of course, leaves the problem of how to distinguish genuine UTF-8  
 from iso-8859-99 when you see it. Fortunately, it is well known that given  
a sample of 10 or so characters you can correctly tell on 99.9% of  
occasions that it is, or is not, UTF-8 (and most queries these day seem to  
be _much_ longer than 10 characters :-( ).

So a sensible strategy for a browser would be to try it both ways, and to  
see which made sense (giving a preference to iso-8859-99 in the few cases  
where both appeared to work). That strategy would probably work often  
enough to be useful, and I think we have already agreed that there is no  
100% solution.

-- 
Charles H. Lindsey ---------At Home, doing my own thing------------------------
Tel: +44 161 436 6131                       
   Web: http://www.cs.man.ac.uk/~chl
Email: chl@clerew.man.ac.uk      Snail: 5 Clerewood Ave, CHEADLE, SK8 3JU, U.K.
PGP: 2C15F1A9      Fingerprint: 73 6D C2 51 93 A0 01 E7 65 E8 64 7E 14 A4 AB A5
Received on Wednesday, 25 June 2008 10:00:48 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 13 January 2011 12:15:41 GMT