- From: Martin J. Dürst <duerst@it.aoyama.ac.jp>
- Date: Mon, 25 Jul 2011 15:51:19 +0900
- To: Chris Weber <chris@lookout.net>, Anne van Kesteren <annevk@opera.com>
- CC: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>, public-iri@w3.org, Charles McCathieNevile <chaals@opera.com>
Hello Chris, others, On 2011/07/22 7:44, Chris Weber wrote: > On 7/21/2011 1:15 PM, Leif Halvard Silli wrote: >> The page in question uses Windows-1252/ISO-8859-1. Question: Would it >> have made a difference if instead of using ISO-8859-1 based percent >> encoding, Martin had typed the letter 'ü' directly? > > Yes it would, see "test 2" at <http://lookout.net/test/iri/mixenc.php>. > Using the same browser builds Martin did, but a slightly different test > setup. Great test! > Test 2 from my set maps to Martin's Test 1 in that the "Dürst" is a part > of the path component and encoded in iso-8859-1 - he percent-encoded %FC > and I used the raw byte 0xFC. The test case is represented below, where > <0xHH> represents a raw byte. > > http://www.example.com/D<0xFC>rst/ > > The results of display are below. > > Opera (11.50, Win7): > http://www.example.com/Dürst/ > > Note here that the raw byte <0xFC> was visibly converted to Unicode > <0xC3 0xBC> and displayed as iso-8859-1 (presumably) in the display. Double-encoding! What a mess! I hope this can be fixed soon. I have put some people from Opera into the cc. Anne should be on the list anyway, but I hope he notices it more quickly. > Firefox (5.0, Win7): > http://www.example.com/Dürst/ > > IE (8.0.7601.17514, Win7): > http://www.example.com/Dürst/ > > Chrome (12.0.742.122, Win7): > http://www.example.com/Dürst/ > > Safari (5.0.4 (7533.20.27)): > http://www.example.com/Dürst/ > > In all of the above cases, the <0xFC> was transcoded to UTF-8 and > percent-encoded for the generated HTTP request. > > http://www.example.com/D%C3%BCrst/ All these just work as expected, according to RFC 3987. If Opera got fixed, we would be perfect :-!. >> Because, if, in a ISO-8859-1 encoded page, hef="D%FCrst" does not work >> as well as href="Dürst", then I think HTML5 validators in fact should >> warn against use of percent encoding that isn't UTF-8 based. > > That would probably be ideal but would not provide for raw data that > might need to be passed in the IRI, especially the query component. The query component is a separate issue, and I think there should be separate tests (including browser display) for it. The other issue is that there might be a server that e.g. has resource names encoded in ISO-8859-1, or where a resource name otherwise contains a byte such as <0xFC>, and for such a server, changing from hef="D%FCrst" to href="Dürst" would be a bad idea. Regards, Martin.
Received on Monday, 25 July 2011 06:52:48 UTC