W3C home > Mailing lists > Public > public-iri@w3.org > July 2011

Re: How browsers display IRI's with mixed encodings

From: Chris Weber <chris@lookout.net>
Date: Thu, 21 Jul 2011 19:30:03 -0700
Message-ID: <4E28E0AB.4010000@lookout.net>
To: public-iri@w3.org
On 7/21/2011 5:29 PM, Jungshik Shin (신정식, 申政湜) wrote:
> I think your html page declared its encoding to be in ISO-8859-1. Then,
> it's not an mixed encoding because xEF xBC xA1 is a perfectly fine
> ISO-8859-1 sequence.

Right, I stated "mixed encodings" purposely as a misnomer of sorts, 
which I thought I may have alluded to by mentioning the test reference 
included "bytes representing UTF-8" within an iso-8859-1 encoded document.

> The above is Chrome's internal representation of the URL in question
> (aside from the spec+ host part). When displaying the URL in the
> omnibox,  the path part is always interpreted as UTF-8. The query part
> is tested for 'UTF8ness' (after unescaping). If it *can* be interpreted
> as UTF-8, it's converted to characters. Otherwise, it remains %-escaped
> in the display.

That was the point of the test, which I may have failed at trying to 
describe.  The point being to test that display of the path and query 
parts when they contain unescaped 'UTF8ness'.

Thanks for the feedback,
Received on Friday, 22 July 2011 02:30:30 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 16:14:42 UTC