Re: How browsers display URIs with %-encoding (Opera/Firefox FAIL)

On 7/21/2011 1:15 PM, Leif Halvard Silli wrote:
> The page in question uses Windows-1252/ISO-8859-1. Question: Would it
> have made a difference if instead of using ISO-8859-1 based percent
> encoding, Martin had typed the letter 'ü' directly?

Yes it would, see "test 2" at <http://lookout.net/test/iri/mixenc.php>. 
  Using the same browser builds Martin did, but a slightly different 
test setup.

Test 2 from my set maps to Martin's Test 1 in that the "Dürst" is a part 
of the path component and encoded in iso-8859-1 - he percent-encoded %FC 
and I used the raw byte 0xFC.  The test case is represented below, where 
<0xHH> represents a raw byte.

http://www.example.com/D<0xFC>rst/

The results of display are below.

Opera (11.50, Win7):
   http://www.example.com/Dürst/

Note here that the raw byte <0xFC> was visibly converted to Unicode 
<0xC3 0xBC> and displayed as iso-8859-1 (presumably) in the display.

Firefox (5.0, Win7):
   http://www.example.com/Dürst/

IE (8.0.7601.17514, Win7):
   http://www.example.com/Dürst/

Chrome (12.0.742.122, Win7):
   http://www.example.com/Dürst/

Safari (5.0.4 (7533.20.27)):
   http://www.example.com/Dürst/

In all of the above cases, the <0xFC> was transcoded to UTF-8 and 
percent-encoded for the generated HTTP request.

   http://www.example.com/D%C3%BCrst/


>
> Because, if, in a ISO-8859-1 encoded page, hef="D%FCrst" does not work
> as well as href="Dürst", then I think HTML5 validators in fact should
> warn against use of percent encoding that isn't UTF-8 based.

That would probably be ideal but would not provide for raw data that 
might need to be passed in the IRI, especially the query component.


Best regards,
Chris

Received on Thursday, 21 July 2011 22:44:55 UTC