W3C home > Mailing lists > Public > www-international@w3.org > October to December 2012

[Bug 15359] Make BOM trump HTTP

From: <bugzilla@jessica.w3.org>
Date: Tue, 27 Nov 2012 00:57:08 +0000
To: www-international@w3.org
Message-ID: <bug-15359-4285-SfS8a9NPXT@http.www.w3.org/Bugs/Public/>

--- Comment #25 from Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no> ---
(In reply to comment #22)
> The initial impetus for the change appears to be that Trident and WebKit
> supported different behaviour than the spec. If paving the cowpath is the
> main motivator, we have a problem, since Trident (in IE10) prioritises HTTP
> over BOM. 
> Should we switch back to the previous approach in the light of Trident?
> Also, whilst I suspect that this may in fact make life easier for HTML pages
> for most cases, I wonder how much discussion has taken place about the
> implications wrt other formats. Afaik the i18n folks were unaware of the
> change, so we haven't discussed. Has anyone actually discussed Anne's
> proposal with the CSS and XML people?

The initial impetus is bug 12897,
(https://www.w3.org/Bugs/Public/show_bug.cgi?id=12897) which I filed in june
2011. You will see, if you read that long and - ahem - convoluted bug, that I
cared a lot about checking XML parser behavior (see below).

When I worked on bug 12897, I also discussed it over at www-international@.
Thus the i18n community has been made uaware long since the.

It is a pity that IE10 stopped being compatible with itself. However, IE has
not - yet -started to grow in popularity. So that is less of an argument now,
in a way. (But Anne has a point in that web-compatibility should matter most.)

(In reply to comment #23)
> Btw, see
> http://w3c-test.org/framework/details/i18n-html5/character-encoding-034 for
> test results on various platforms. The test asserts the expected result from
> before the spec was changed, so for the current spec text a pass is a fail
> and vice versa.

There you only test HTML browser behavior. In bug 12897, I also tested XML
parser behavior. Thus it is not true that just Trident and Webkit started this.
(See below.) 

(In reply to comment #24)
> The move from Trident was probably more to comply with the specification
> than to not break legacy content. The latest CSS drafts have been updated to
> take the Encoding Standard into account. Dunno about XML, but it should
> follow suit.

XML parsers/editors has, partly, started to follow suit. In comment 10 fo bug
12897, I wrote:

]] * Parsers *not* implementing RFC3023 (thus giving priority to document data
instead), and which do not emit fatal errors: Webkit, Xerces C++, XMLMind
Editor on Mac (based on Xerces Java), RXP, oXygen [[

Thus, the above browsers/parsers adheres to the BOM rather than to HTTP. 

In that bug I also noted that Libxml2 was the *only* XML parser I found (apart
from Firefox and Opera, according to how they behaved then) which gave priority
to HTTP. But in the next comment, I reported that Libxml2, for files stored in
a file system (file:// URL), "ignores the UTF-8 BOM. And obeyes the XML
encoding declaration." Thus, quite on the head.

In a bold move, I tried to file bugs against XML parsers, to get them to
respect HTTP most. And I know that Xerces C++ actually started  on it (but I
hope they did not finish it). I also concated oXygen and XMLMind, but they were
not enthusiast to adher to RFC3023. More on the contrary, actually.

I also, btw, discoverd that the XML working group had pretty much given a damn
about having a test suite for this - I think the were only one relevant BOM
test in the entire suite (submitted by John Cowan). Which one probable reason
why the uniforimity is so bad.

Anyway, and as a summary: XML parsers handling of the BOM, especially (perhaps)
the UTF-8 BOM, is quite messy, actually, and there is, for XML, lots to ask for
with regard to unified behavior when it comes to which encoding declaration
method that has priority.

You are receiving this mail because:
You are on the CC list for the bug.
Received on Tuesday, 27 November 2012 00:57:13 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 21 September 2016 22:37:33 UTC