[whatwg] Charset sniffing from XML prolog from Kartikaya Gupta on 2009-10-08 (public-whatwg-archive@w3.org from October 2009)

From: Kartikaya Gupta <lists.whatwg@stakface.com>
Date: Thu, 08 Oct 2009 01:29:17 +0000
Message-ID: <20091008012918.2E9F63BB3D@looneymail-mx1.g.dreamhost.com>

On Wed, 07 Oct 2009 20:23:35 -0400, Boris Zbarsky <bzbarsky at MIT.EDU> wrote:
> On 10/7/09 7:51 PM, Kartikaya Gupta wrote:
> > I tried it again in Chrome and if I paste the above in the address bar I get US-ASCII. But if I save it to a file and then load it I get UTF-8. I checked the headers being sent from Apache and they don't include any sneaky encoding hints, just Content-Type: text/html.
> 
> Can you attach the exact file you saved?  Does it have a BOM, perchance?
> 
> 

No BOM (I created the files using vim, and checked them with xxd).

Using document.inputEncoding:
http://stakface.com/pub/mango/fakexml.html
http://stakface.com/pub/mango/fakexml_iso.html

Using a degree symbol in UTF-8:
http://stakface.com/pub/mango/fakexml2.html
http://stakface.com/pub/mango/fakexml2_iso.html

In both cases the _iso version has a tweaked prolog such that it goes back to ISO-8859-1 in Firefox. Chrome still detects fakexml_iso.html as UTF-8. I've now also tested in Firefox on Mac (Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.1.3) Gecko/20090824 Firefox/3.5.3) which also has a default encoding of ISO-8859-1 as per the preferences.

kats

Received on Wednesday, 7 October 2009 18:29:17 UTC