- From: Julian Reschke <julian.reschke@gmx.de>
- Date: Tue, 24 Jun 2008 22:59:32 +0200
- To: Ian Hickson <ian@hixie.ch>
- CC: Anne van Kesteren <annevk@opera.com>, uri@w3.org
Ian Hickson wrote: > ... >> You could change the algorithm how to get to the IRI in the first place, such >> as making it equivalent to: >> >> <a href="results.cgi/Ž?Þ"> >> >> ...in which case the standard IRI->URI conversion would yield the expected >> result. > > I'm not really sure what that would look like, compared to what I have > now. Could you elaborate? 1. Consider the input an IRI 2. Convert non-ASCII characters in the query part to URI characters by encoding them in the document characters set, then percent-escaping 3. Go on with regular IRI->URI conversion. Of course that's almost the same as re-doing all the work done in the IRI spec, but at least you wouldn't need to worry about IDN stuff. >>> IE actually sends http://example.com/results.cgi/%C5%BD?* where "*" is the >>> ISO-8859-13-encoded 8-bit byte for that character. If you target an >> Now that suggests to me that there is no interop between IE and Safari, and >> thus whatever you specify *may* break something. > > The situation is far from perfect, indeed. That's why we need specs that > define error handling, to avoid this mess where Web content relies on > unspecified issues and forces interoperability through > reverse-engineering. (In this particular case, the differences between IE > and the other browsers don't matter much because sites tend to only use > one encoding, so the encoding source doesn't matter, and tend to convert > %-escaped bits into their equivalent 8 bit octets before processing them, > so they see the 8-bit URIs and the %-escaped URIs as equivalent.) As long as no intermediate re-encodes the resource. >>>> Now, that being said, is there anything HTML5 could do so we can get >>>> closer to a strict UTF-8 world in the future? Such as allowing >>>> servers to serve document in an encoding != UTF-8, but still get >>>> query parameters to be consistently encoded in UTF-8? >>> There might be, but I don't see any way to get there at the moment. >>> Any suggestions would be very welcome. >> A form attribute through which the site can state: "I want >> UTF-8-encoding-then-percent-escaping, no matter what the document >> encoding was"? > > We have that already. It doesn't really help regular links. Regular links aren't a problem (if I understand "regular" correctly), because the site owner generated them. >> Or potentially, in a more distant future, some way of specifying URI >> templates (*)? >> >> (*) Yes, when they are ready... > > Maybe. BR, Julian
Received on Tuesday, 24 June 2008 21:00:19 UTC