W3C home > Mailing lists > Public > public-html-bugzilla@w3.org > July 2012

[Bug 15359] Make BOM trump HTTP

From: <bugzilla@jessica.w3.org>
Date: Thu, 05 Jul 2012 12:44:25 +0000
Message-Id: <E1SmlQ1-0002ap-5v@jessica.w3.org>
To: public-html-bugzilla@w3.org
https://www.w3.org/Bugs/Public/show_bug.cgi?id=15359

--- Comment #7 from Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no> 2012-07-05 12:44:24 UTC ---
(In reply to comment #6)

To put it politely: You have not read the XML 1.0 spec correctly.

Section '4.3.3 Character Encoding in Entities' is NORMATIVE. [1]  Whereas the
section you talk appear to be talking about, is Appendix F. [2] Appendix F only
contains helpful instructions/tips for how to fulfill the requirements of
section 4.3.3.

[1] http://www.w3.org/TR/REC-xml/#charencoding
[2] http://www.w3.org/TR/REC-xml/#sec-guessing

Appendix F.1 first - in the first table - discusses how to sniff the encoding
when there is a BOM. This is simple: Ifone parses an XML document which
_contains_ the BOM as something other than UTF-16, UTF-8 or UTF-32 (UCS-4),
then the BOM is not a BOM but an illegal character = fatal error.

Then - in the second table - it discusses how to sniff it when there is no BOM.
This is also simple: Except for UTF-8, it is impossible, per the rules that XML
operates with.

And therefore, in the second table of Appendix F.1, each row (except the last
row about UTF-8!) ends roughly the same way: "The encoding declaration must be
read to determine" (the encoding). And if there is no encoding declaration,
then it is a fatal error, per section 4.3.3. Section 4.3.3. is also clear about
the fact that if there is an external (typically HTTP) or internal encdoing
declaration, and this declaration turns out to be incorrect, then it is a fatal
error. Lack of encoding information is also considered as a signal that the
document is UTF-8 encoded.

The effect of all this is that, in XML, it should always cause a fatal error if
you try to override the encoding. Firefox is probably one of the XML parsers
that _best_ reflects XML's encoding rules. So if you are in doubt, I suggest
that you do some experimentes for yourself.

-- 
Configure bugmail: https://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.
Received on Thursday, 5 July 2012 12:44:26 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 5 July 2012 12:44:26 GMT