Re: How browsers display IRI's with mixed encodings from Chris Weber on 2011-07-22 (public-iri@w3.org from July 2011)

From: Chris Weber <chris@lookout.net>
Date: Thu, 21 Jul 2011 19:30:03 -0700
To: public-iri@w3.org
Message-ID: <4E28E0AB.4010000@lookout.net>

On 7/21/2011 5:29 PM, Jungshik Shin (신정식, 申政湜) wrote:
> I think your html page declared its encoding to be in ISO-8859-1. Then,
> it's not an mixed encoding because xEF xBC xA1 is a perfectly fine
> ISO-8859-1 sequence.

Right, I stated "mixed encodings" purposely as a misnomer of sorts, 
which I thought I may have alluded to by mentioning the test reference 
included "bytes representing UTF-8" within an iso-8859-1 encoded document.

> The above is Chrome's internal representation of the URL in question
> (aside from the spec+ host part). When displaying the URL in the
> omnibox,  the path part is always interpreted as UTF-8. The query part
> is tested for 'UTF8ness' (after unescaping). If it *can* be interpreted
> as UTF-8, it's converted to characters. Otherwise, it remains %-escaped
> in the display.

That was the point of the test, which I may have failed at trying to 
describe.  The point being to test that display of the path and query 
parts when they contain unescaped 'UTF8ness'.

Thanks for the feedback,
-Chris

Received on Friday, 22 July 2011 02:30:30 UTC