- From: Chris Weber <chris@lookout.net>
- Date: Thu, 21 Jul 2011 15:44:06 -0700
- To: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
- CC: public-iri@w3.org
On 7/21/2011 1:15 PM, Leif Halvard Silli wrote: > The page in question uses Windows-1252/ISO-8859-1. Question: Would it > have made a difference if instead of using ISO-8859-1 based percent > encoding, Martin had typed the letter 'ü' directly? Yes it would, see "test 2" at <http://lookout.net/test/iri/mixenc.php>. Using the same browser builds Martin did, but a slightly different test setup. Test 2 from my set maps to Martin's Test 1 in that the "Dürst" is a part of the path component and encoded in iso-8859-1 - he percent-encoded %FC and I used the raw byte 0xFC. The test case is represented below, where <0xHH> represents a raw byte. http://www.example.com/D<0xFC>rst/ The results of display are below. Opera (11.50, Win7): http://www.example.com/Dürst/ Note here that the raw byte <0xFC> was visibly converted to Unicode <0xC3 0xBC> and displayed as iso-8859-1 (presumably) in the display. Firefox (5.0, Win7): http://www.example.com/Dürst/ IE (8.0.7601.17514, Win7): http://www.example.com/Dürst/ Chrome (12.0.742.122, Win7): http://www.example.com/Dürst/ Safari (5.0.4 (7533.20.27)): http://www.example.com/Dürst/ In all of the above cases, the <0xFC> was transcoded to UTF-8 and percent-encoded for the generated HTTP request. http://www.example.com/D%C3%BCrst/ > > Because, if, in a ISO-8859-1 encoded page, hef="D%FCrst" does not work > as well as href="Dürst", then I think HTML5 validators in fact should > warn against use of percent encoding that isn't UTF-8 based. That would probably be ideal but would not provide for raw data that might need to be passed in the IRI, especially the query component. Best regards, Chris
Received on Thursday, 21 July 2011 22:44:55 UTC