W3C home > Mailing lists > Public > public-iri@w3.org > July 2011

Re: How browsers display URIs with %-encoding (Opera/Firefox FAIL)

From: Chris Weber <chris@lookout.net>
Date: Thu, 21 Jul 2011 15:44:06 -0700
Message-ID: <4E28ABB6.4080805@lookout.net>
To: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
CC: public-iri@w3.org
On 7/21/2011 1:15 PM, Leif Halvard Silli wrote:
> The page in question uses Windows-1252/ISO-8859-1. Question: Would it
> have made a difference if instead of using ISO-8859-1 based percent
> encoding, Martin had typed the letter 'ü' directly?

Yes it would, see "test 2" at <http://lookout.net/test/iri/mixenc.php>. 
  Using the same browser builds Martin did, but a slightly different 
test setup.

Test 2 from my set maps to Martin's Test 1 in that the "Dürst" is a part 
of the path component and encoded in iso-8859-1 - he percent-encoded %FC 
and I used the raw byte 0xFC.  The test case is represented below, where 
<0xHH> represents a raw byte.


The results of display are below.

Opera (11.50, Win7):

Note here that the raw byte <0xFC> was visibly converted to Unicode 
<0xC3 0xBC> and displayed as iso-8859-1 (presumably) in the display.

Firefox (5.0, Win7):

IE (8.0.7601.17514, Win7):

Chrome (12.0.742.122, Win7):

Safari (5.0.4 (7533.20.27)):

In all of the above cases, the <0xFC> was transcoded to UTF-8 and 
percent-encoded for the generated HTTP request.


> Because, if, in a ISO-8859-1 encoded page, hef="D%FCrst" does not work
> as well as href="Dürst", then I think HTML5 validators in fact should
> warn against use of percent encoding that isn't UTF-8 based.

That would probably be ideal but would not provide for raw data that 
might need to be passed in the IRI, especially the query component.

Best regards,
Received on Thursday, 21 July 2011 22:44:55 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 16:14:42 UTC