W3C home > Mailing lists > Public > public-iri@w3.org > July 2011

Re: How browsers display IRI's with mixed encodings

From: Chris Weber <chris@lookout.net>
Date: Thu, 21 Jul 2011 19:46:54 -0700
Message-ID: <4E28E49E.9020501@lookout.net>
To: public-iri@w3.org
On 7/21/2011 5:30 PM, Phillips, Addison wrote:
>
> Looking at your test page, I'm not sure how valid a test it is. The page declares an encoding of ISO 8859-1. Having a "UTF-8 encoded path" in the page is a lie. Those bytes are all valid windows-1252 characters (per HTML5, nearly all browsers treat ISO8859-1 as windows-1252). So the path isn't actually "UTF-8 encoded". To me the test looks broken.
>

I was calling attention to Test 3 which was testing "UTF8ness", as 
Jungshik put it, in the query component.  It sounds like you're 
referring to Test 1 which had "UTF8ness" in the path, for which of 
course you're right it's a lie and should read something more like 
"Contains a byte sequence which is also valid UTF-8".

The point of this was to test the display as Martin had, but using 
unescaped bytes.  From the results of Test 3 it looks like Firefox, 
Chrome, and Safari all check for "UTF8ness" in the query component when 
displaying the IRI in spite of the page encoding, hence you can visually 
see the U+FF21 FULLWIDTH LATIN CAPITAL LETTER A.  Whereas Opera and MSIE 
do not and show you a) the percent-encoded bytes and b) the bytes 
represented in their page encoding respectively - do you agree with that 
assessment?

-Chris
Received on Friday, 22 July 2011 02:47:29 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 16:14:42 UTC