Re: How browsers display URIs with %-encoding (Opera/Firefox FAIL)

On Thu, Jul 21, 2011 at 3:44 PM, Chris Weber <chris@lookout.net> wrote:

> On 7/21/2011 1:15 PM, Leif Halvard Silli wrote:
>
>> The page in question uses Windows-1252/ISO-8859-1. Question: Would it
>> have made a difference if instead of using ISO-8859-1 based percent
>> encoding, Martin had typed the letter 'ü' directly?
>>
>
> Yes it would, see "test 2" at <http://lookout.net/test/iri/**mixenc.php<http://lookout.net/test/iri/mixenc.php>>.
>  Using the same browser builds Martin did, but a slightly different test
> setup.
>

Your test and Martin's test are totally different. 0xFC embedded in
ISO-8859-1 encoded page represents a character U+00FC [1]  while a
stand-alone "%FC" in a page of any encoding in the path part should NOT be
interpreted as a character because it's not a valid UTF-8 sequence.

Jungshik

[1] 0xFC (raw byte) in ISO-8859-5 would be a character U+045C, but a
stand-alone '%FC' in the path part should remain escaped no matter what the
page encoding is.





> Test 2 from my set maps to Martin's Test 1 in that the "Dürst" is a part of
> the path component and encoded in iso-8859-1 - he percent-encoded %FC and I
> used the raw byte 0xFC.  The test case is represented below, where <0xHH>
> represents a raw byte.
>
> http://www.example.com/D<0xFC>**rst/
>
> The results of display are below.
>
> Opera (11.50, Win7):
>  http://www.example.com/Dürst/
>
> Note here that the raw byte <0xFC> was visibly converted to Unicode <0xC3
> 0xBC> and displayed as iso-8859-1 (presumably) in the display.
>
> Firefox (5.0, Win7):
>  http://www.example.com/Dürst/
>
>
> IE (8.0.7601.17514, Win7):
>  http://www.example.com/Dürst/
>
>
> Chrome (12.0.742.122, Win7):
>  http://www.example.com/Dürst/
>
>
> Safari (5.0.4 (7533.20.27)):
>  http://www.example.com/Dürst/
>
> In all of the above cases, the <0xFC> was transcoded to UTF-8 and
> percent-encoded for the generated HTTP request.
>
>  http://www.example.com/D%C3%**BCrst/ <http://www.example.com/D%C3%BCrst/>
>
>
>
>
>> Because, if, in a ISO-8859-1 encoded page, hef="D%FCrst" does not work
>> as well as href="Dürst", then I think HTML5 validators in fact should
>> warn against use of percent encoding that isn't UTF-8 based.
>>
>
> That would probably be ideal but would not provide for raw data that might
> need to be passed in the IRI, especially the query component.
>
>
> Best regards,
> Chris
>
>

Received on Friday, 22 July 2011 01:10:06 UTC