- From: Charles Lindsey <chl@clerew.man.ac.uk>
- Date: Wed, 25 Jun 2008 11:00:02 +0100
- To: URI <uri@w3.org>
On Wed, 25 Jun 2008 03:11:25 +0100, Ian Hickson <ian@hixie.ch> wrote: > On Wed, 25 Jun 2008, Frank Ellermann wrote: >> Ian Hickson wrote: >> >> > <!DOCTYPE HTML> >> > <title>Test</title> >> > <meta charset="ISO-8859-13"> >> > <a href="results.cgi/Ž?Ž">Link</a> >> >> > ...what is the link? >> >> It is whatever the unspecified "HTML" document type definition says. > > Ok. > > >> So this is an IRI, no URI, and invalid in document types permitting only >> URIs. > > Well there's no question that it's invalid, the question is what should > browsers do with it. Essentially, it is up to the browser what it accepts. Normally, one expects IRIs/URIs published by or on behalf of the browser to be a form which that browser understands. It is only queries, which are likely to be composed by unsuspecting clients, that are the real problem. In an ideal world, all browsers would publish their pages in UTF-8, and the question would then never arise; and maybe it will be like that one day. But in the meantime, a sensible strategy for a browser whose pages were published in iso-8859-99 (whatever that might be) to accept IRIs/URIs (and especially queries) %-encoded into iso-8859-99; but also, *in addition* to convert incoming UTF-8 (whether in IRIs or %-encoded in URIs) to its own iso-8859-99. That, of course, leaves the problem of how to distinguish genuine UTF-8 from iso-8859-99 when you see it. Fortunately, it is well known that given a sample of 10 or so characters you can correctly tell on 99.9% of occasions that it is, or is not, UTF-8 (and most queries these day seem to be _much_ longer than 10 characters :-( ). So a sensible strategy for a browser would be to try it both ways, and to see which made sense (giving a preference to iso-8859-99 in the few cases where both appeared to work). That strategy would probably work often enough to be useful, and I think we have already agreed that there is no 100% solution. -- Charles H. Lindsey ---------At Home, doing my own thing------------------------ Tel: +44 161 436 6131 Web: http://www.cs.man.ac.uk/~chl Email: chl@clerew.man.ac.uk Snail: 5 Clerewood Ave, CHEADLE, SK8 3JU, U.K. PGP: 2C15F1A9 Fingerprint: 73 6D C2 51 93 A0 01 E7 65 E8 64 7E 14 A4 AB A5
Received on Wednesday, 25 June 2008 10:00:48 UTC