Re: How browsers display IRI's with mixed encodings

Hi,

You didn't tell us exactly what you did. Could you tell us what you exactly
did?

Did you these URLs in an html page (href?)? In what encoding is the html
page (declared encoding) ? ISO-8859-1 or UTF-8?

Thanks,

Jungshik

On Thu, Jul 21, 2011 at 5:02 PM, Chris Weber <chris@lookout.net> wrote:

> I'm going on a tangent from Martin's intent in the previous email, but it
> seems in the same vein overall.  I was including some mixed encoding tests -
> iso-8859-1 mixed with UTF-8 in a hyperlink on an transitional HTML page
> served with the "iso-8859-1" Content-Type.  The results are similar to
> Martin's test in the way bytes representing UTF-8 will be treated as such
> (most often) even in an iso-8859-1 page encoding.
>
> From the test page at <http://lookout.net/test/iri/**mixenc.php<http://lookout.net/test/iri/mixenc.php>>
> Test 3 mixes the raw bytes which would represent U+FF21 FULLWIDTH LATIN
> CAPITAL LETTER A in UTF-8, along with iso-8859-1 raw bytes for the "ü" in
> "Dürst".  The following hyperlink represents the test case where <0xNN> is a
> raw byte.
>
> http://www.example.com/D<0xFC>**rst/?<0xEF 0xBC 0xA1>
>
>


> The results of the display are as follows.
>
> Opera (11.50, Win7):
>  http://www.example.com/Dürst/**?%EF%BC%A1
>
> Firefox (5.0, Win7):
>  http://www.example.com/Dürst/?**
>
> IE (8.0.7601.17514, Win7):
>  http://www.example.com/Dürst/?**<http://www.example.com/D%C3%BCrst/?%C3%AF>
> ¼¡
>
> Chrome (12.0.742.122, Win7):St
>  http://www.example.com/Dürst/?**
>
> Safari (5.0.4 (7533.20.27)):
>  http://www.example.com/Dürst/?**
>
> With the exception of IE, all of the above generated the following HTTP
> request :
>
>  GET /D%C3%BCrst/?%EF%BC%A1
>
> IE of course does not escape the bytes in the query string.
>
>  GET /D%C3%BCrst/?A
>
> I tried to capture some of these test results into a table form at:
> <https://spreadsheets0.google.**com/spreadsheet/ccc?key=**
> 0AifoWoA0trUndEZSTlRRNnd5MzE3N**3RYOVlIVFFMREE&hl=en_US#gid=5<https://spreadsheets0.google.com/spreadsheet/ccc?key=0AifoWoA0trUndEZSTlRRNnd5MzE3N3RYOVlIVFFMREE&hl=en_US#gid=5>
> >
>
> A question for browser implementers - In some cases it's obvious (Opera and
> MSIE) and others not so much: Do you know if the status bar display is using
> the page encoding or has converted the URI to UTF-8 for display?
>
>
> Best regards,
> Chris
>
>
>
>
>

Received on Friday, 22 July 2011 00:19:02 UTC