W3C home > Mailing lists > Public > public-html-bugzilla@w3.org > July 2012

[Bug 15359] Make BOM trump HTTP

From: <bugzilla@jessica.w3.org>
Date: Fri, 06 Jul 2012 02:30:39 +0000
Message-Id: <E1SmyJb-0002HY-OG@jessica.w3.org>
To: public-html-bugzilla@w3.org

--- Comment #10 from Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no> 2012-07-06 02:30:39 UTC ---
(In reply to comment #9)
> Firstly, as I said, this bug covers only where there is a BOM, not where there
> is neither a BOM nor an encoding declaration nor a header.

BOM is a side track. My question was: Do you dislike the "user experience" of
XML when  it comes to its prohibitance against manual encoding overriding?
Becasue XML's user experience is the same regardless of whether there is a BOM
or not. If you can live with XML's _general_ prohibition against manual
encoding overriding, then I don't see why you can't also live with the same
strict rule for the _specific_ subset of HTML when there is a BOM.

> Secondly, the bug suggests ignoring headers when there is a BOM present, but
> the XML spec. *specifically* says that "external character encoding
> information" can be used to determine the encoding.

Correct. It is currently also against the HTTP specs.

> So if I have:
> 0xFE 0xFF <?xml encoding="ISO-8859-1"?>

1) This XML declaration is invalid as it lacks the version attribute.
2) There are two characters, 0xFE 0xFF, in front of the declaration. 

Note regarding 2): As I have tried to say before, 4.3.3 specs that:

     "It is a fatal error for a TextDecl to occur other than at the 
      beginning of an external entity."

> It is a fatal error to decode it as UTF-16. Sure, this causes other problems,
> but not necessarily fatal errors (at least in XML 1.0).

Wrong. See my 'Note regarding 2)' above.

> Your arguments over using the UI to configure the charset are not within the
> original scope of this bug.

This bug requests the behaviour of IE and Webkit to be standardized. And IE and
Webkit do prohibit manual overriding. Meanwhile, the bug requester has since
written the 'Encoding' standard, in which he states: [*]

   "For compatibility with deployed content, the byte order mark (also
    known as BOM) is considered more authoritative than anything else."

[*] http://dvcs.w3.org/hg/encoding/raw-file/tip/Overview.html#decode-and-encode

> Thirdly, as I said, not all HTML will be XML.

There is benefit - security wise and practical - in disallowing users
to "disturb" the encoding regardless of whether the page is XML or HTML.

>> By the way: In that case it is an illegal character per HTML5 as well: A UTF-8
>> document with a BOM would  be would bring the browser into Quirks-Mode if the
>>browser reads the document as - for example - ISO-8859-1.
> Yes, it would typically trigger Quirks mode (except in some, perhaps only
> theoretical, encodings). That's not a fatal error though.

So, now you are offering me at least one use case: To allow users to
place the page in quirks-mode. Frankly: I dismiss that use case.

>  but you must be careful only to read what the spec. actually says.

A hillarious comment. But an excellent advice.

Configure bugmail: https://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.
Received on Friday, 6 July 2012 02:30:40 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 16:31:30 UTC