- From: Frank Ellermann <nobody@xyzzy.claranet.de>
- Date: Wed, 25 Jun 2008 02:29:12 +0200
- To: uri@w3.org
Ian Hickson wrote: > <!DOCTYPE HTML> > <title>Test</title> > <meta charset="ISO-8859-13"> > <a href="results.cgi/Ž?Ž">Link</a> > ...what is the link? It is whatever the unspecified "HTML" document type definition says. In the case of a specified HTML- or XHTML-document type hex. NCRs represent Unicode points since at least RFC 2070 and/or HTML 4, see <http://purl.net/net/ucode/017d> So this is an IRI, no URI, and invalid in document types permitting only URIs. That the HTML 4 spec. is ate best fuzzy about this is one of the reasons why you want HTML5, isn't it ? > Safari, for instance, will fetch the following URI > (assuming the base URL is http://example.com/): > http://example.com/results.cgi/%C5%BD?%DE Trying to be smart... :-( If it can deal with UTF-8, and obviously that is the case, it should better try %C5%BD also as query. Otherwise the server has no good chance to figure what is going on. > we have to define the processing that led to two > characters in the same URL being encoded using two > different character encodings. Just don't, take RFC 3987 "as is". It is technically not possible to define something else without running into logical problems like your Safari + IE examples. > Any suggestions would be very welcome. Stay out of trouble. There is a list about IRIs, and if folks want to update RFC 3987 in wild and wonderful ways they can try their luck on this list. Follow all standards down to their damned last comma, or update them. Any attempt to "redefine" standards elsewhere, e.g. directly in HTML5, is doomed. Frank
Received on Wednesday, 25 June 2008 00:28:18 UTC