W3C home > Mailing lists > Public > whatwg@whatwg.org > October 2009

[whatwg] Charset sniffing from XML prolog

From: Ian Hickson <ian@hixie.ch>
Date: Fri, 16 Oct 2009 21:46:16 +0000 (UTC)
Message-ID: <Pine.LNX.4.62.0910162141430.25383@hixie.dreamhostps.com>
On Wed, 7 Oct 2009, Kartikaya Gupta wrote:
> If a document is served as text/html, but contains an XML prolog with an 
> encoding attribute, it seems that all Firefox, Opera, and Chrome all 
> pick up the encoding from the prolog and use it when parsing the rest of 
> the document. (IE6 does not). The HTML5 spec doesn't seem to include 
> XML-prolog checking in its encoding sniffing algorithm, should it?
> <?xml version="1.0" encoding="utf-8"?>
> <html>insert utf-8 content here, or alert(document.inputEncoding) for browsers that support it</html>

On Thu, 8 Oct 2009, Kartikaya Gupta wrote:
> So then is this behavior getting axed or specced? The site in question 
> that relies on this behavior is http://bell.mobi/primary - it's not as 
> noticeable in the english-locale version but if you switch to a french 
> locale you get a bunch of french encoded as utf-8. Browsers with the 
> prolog sniffing will render it fine but others will show garbage.
> I'd be happier with not having to change my code to deal with this 
> website, since it will occasionally show garbage even in utf-8.

UTF-8 is detectable, so if there's no other encoding declarations, and if 
this is the only site we know of, I'd rather encourage you to add fallback 
UTF-8 detection (as allowed by the spec) rather than add this.

Since IE apprently doesn't do this, I'd also rather not add yet more 
features like this.

So in the absence of more compelling reasons to add this, I'd rather get 
Opera and WebKit to remove the support for this, than add more. (As I 
understand it, Mozilla's new HTML5 parser already removes support for this 
particular "feature".)

Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
Received on Friday, 16 October 2009 14:46:16 UTC

This archive was generated by hypermail 2.4.0 : Wednesday, 22 January 2020 16:59:18 UTC