W3C home > Mailing lists > Public > www-international@w3.org > April to June 2011

Re: Should the UTF-8 BOM trump overriding via HTTP or by users?

From: John Cowan <cowan@mercury.ccil.org>
Date: Tue, 7 Jun 2011 13:41:56 -0400
To: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
Cc: Bjoern Hoehrmann <derhoermi@gmx.net>, www-international <www-international@w3.org>
Message-ID: <20110607174156.GC12202@mercury.ccil.org>
Leif Halvard Silli scripsit:

> Anyway, the above is only a general reasoning (which makes lots of 
> sense). Byt where is the this "über spec" which says that that is how 
> it should work? The only one I have found is XML 1.0, which says that, 
> when there is external encoding information which conflicts with the 
> BOM or the XML declaration, then, quote:
> 
> ]]
> In the interests of interoperability, however, the following rule is 
> recommended.
> 	*	If an XML entity is in a file, the Byte-Order Mark and encoding 
> declaration are used (if present) to determine the character encoding.
> [[
> 
> Note that this means that if the document has no BOM or encoding 
> declaration, then the HTTP header will win despite that UTF-8 is the 
> default encdoing.

Did you paste the wrong quotation?  That explicitly refers to XML entities
in files; i.e. without HTTP metadata.

In any case, Appendix F is non-normative.  The algorithm described in
http://recycledknowledge.blogspot.com/2005/07/hello-i-am-xml-encoding-sniffer.html ,
which has no authority except my own, allows an 8-BOM to override any
XML declaration.  It doesn't handle XML parsed entities.

-- 
John Cowan          http://www.ccil.org/~cowan        cowan@ccil.org
To say that Bilbo's breath was taken away is no description at all.  There are
no words left to express his staggerment, since Men changed the language that
they learned of elves in the days when all the world was wonderful. --The Hobbit
Received on Tuesday, 7 June 2011 17:42:19 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 7 June 2011 17:42:21 GMT