W3C home > Mailing lists > Public > public-iri@w3.org > July 2011

RE: How browsers display IRI's with mixed encodings

From: Phillips, Addison <addison@lab126.com>
Date: Thu, 21 Jul 2011 17:20:57 -0700
To: Jungshik Shin (신정식, 申政湜) <jungshik@google.com>, Chris Weber <chris@lookout.net>
CC: "PUBLIC-IRI@W3.ORG" <PUBLIC-IRI@w3.org>
Message-ID: <131F80DEA635F044946897AFDA9AC3476A947B0CFB@EX-SEA31-D.ant.amazon.com>
It’s in his email. Look at the page at: http://lookout.net/test/iri/mixenc.php

Addison

From: public-iri-request@w3.org [mailto:public-iri-request@w3.org] On Behalf Of Jungshik Shin (???, ???)
Sent: Thursday, July 21, 2011 5:18 PM
To: Chris Weber
Cc: PUBLIC-IRI@W3.ORG
Subject: Re: How browsers display IRI's with mixed encodings

Hi,

You didn't tell us exactly what you did. Could you tell us what you exactly did?

Did you these URLs in an html page (href?)? In what encoding is the html page (declared encoding) ? ISO-8859-1 or UTF-8?

Thanks,

Jungshik

On Thu, Jul 21, 2011 at 5:02 PM, Chris Weber <chris@lookout.net<mailto:chris@lookout.net>> wrote:
I'm going on a tangent from Martin's intent in the previous email, but it seems in the same vein overall.  I was including some mixed encoding tests - iso-8859-1 mixed with UTF-8 in a hyperlink on an transitional HTML page served with the "iso-8859-1" Content-Type.  The results are similar to Martin's test in the way bytes representing UTF-8 will be treated as such (most often) even in an iso-8859-1 page encoding.

>From the test page at <http://lookout.net/test/iri/mixenc.php> Test 3 mixes the raw bytes which would represent U+FF21 FULLWIDTH LATIN CAPITAL LETTER A in UTF-8, along with iso-8859-1 raw bytes for the "ü" in "Dürst".  The following hyperlink represents the test case where <0xNN> is a raw byte.

http://www.example.com/D<0xFC>rst/?<0xEF 0xBC 0xA1>


The results of the display are as follows.

Opera (11.50, Win7):
 http://www.example.com/DÃ<http://www.example.com/D%C3%83>¼rst/?%EF%BC%A1

Firefox (5.0, Win7):
 http://www.example.com/Dürst/?<http://www.example.com/D%C3%BCrst/?>A

IE (8.0.7601.17514, Win7):
 http://www.example.com/Dürst/?<http://www.example.com/D%C3%BCrst/?%C3%AF>¼¡

Chrome (12.0.742.122, Win7):St
 http://www.example.com/Dürst/?<http://www.example.com/D%C3%BCrst/?>A

Safari (5.0.4 (7533.20.27)):
 http://www.example.com/Dürst/?<http://www.example.com/D%C3%BCrst/?>A

With the exception of IE, all of the above generated the following HTTP request :

 GET /D%C3%BCrst/?%EF%BC%A1

IE of course does not escape the bytes in the query string.

 GET /D%C3%BCrst/?A

I tried to capture some of these test results into a table form at:
<https://spreadsheets0.google.com/spreadsheet/ccc?key=0AifoWoA0trUndEZSTlRRNnd5MzE3N3RYOVlIVFFMREE&hl=en_US#gid=5>

A question for browser implementers - In some cases it's obvious (Opera and MSIE) and others not so much: Do you know if the status bar display is using the page encoding or has converted the URI to UTF-8 for display?

Best regards,
Chris




Received on Friday, 22 July 2011 00:21:33 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 16:14:42 UTC