W3C home > Mailing lists > Public > www-international@w3.org > October to December 2012

[Bug 15359] Make BOM trump HTTP

From: <bugzilla@jessica.w3.org>
Date: Tue, 27 Nov 2012 00:57:08 +0000
To: www-international@w3.org
Message-ID: <bug-15359-4285-SfS8a9NPXT@http.www.w3.org/Bugs/Public/>
https://www.w3.org/Bugs/Public/show_bug.cgi?id=15359

--- Comment #25 from Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no> ---
(In reply to comment #22)
> The initial impetus for the change appears to be that Trident and WebKit
> supported different behaviour than the spec. If paving the cowpath is the
> main motivator, we have a problem, since Trident (in IE10) prioritises HTTP
> over BOM. 
> 
> Should we switch back to the previous approach in the light of Trident?
> 
> Also, whilst I suspect that this may in fact make life easier for HTML pages
> for most cases, I wonder how much discussion has taken place about the
> implications wrt other formats. Afaik the i18n folks were unaware of the
> change, so we haven't discussed. Has anyone actually discussed Anne's
> proposal with the CSS and XML people?

The initial impetus is bug 12897,
(https://www.w3.org/Bugs/Public/show_bug.cgi?id=12897) which I filed in june
2011. You will see, if you read that long and - ahem - convoluted bug, that I
cared a lot about checking XML parser behavior (see below).

When I worked on bug 12897, I also discussed it over at www-international@.
Thus the i18n community has been made uaware long since the.

It is a pity that IE10 stopped being compatible with itself. However, IE has
not - yet -started to grow in popularity. So that is less of an argument now,
in a way. (But Anne has a point in that web-compatibility should matter most.)

(In reply to comment #23)
> Btw, see
> http://w3c-test.org/framework/details/i18n-html5/character-encoding-034 for
> test results on various platforms. The test asserts the expected result from
> before the spec was changed, so for the current spec text a pass is a fail
> and vice versa.

There you only test HTML browser behavior. In bug 12897, I also tested XML
parser behavior. Thus it is not true that just Trident and Webkit started this.
(See below.) 

(In reply to comment #24)
> The move from Trident was probably more to comply with the specification
> than to not break legacy content. The latest CSS drafts have been updated to
> take the Encoding Standard into account. Dunno about XML, but it should
> follow suit.

XML parsers/editors has, partly, started to follow suit. In comment 10 fo bug
12897, I wrote:

]] * Parsers *not* implementing RFC3023 (thus giving priority to document data
instead), and which do not emit fatal errors: Webkit, Xerces C++, XMLMind
Editor on Mac (based on Xerces Java), RXP, oXygen [[

Thus, the above browsers/parsers adheres to the BOM rather than to HTTP. 

In that bug I also noted that Libxml2 was the *only* XML parser I found (apart
from Firefox and Opera, according to how they behaved then) which gave priority
to HTTP. But in the next comment, I reported that Libxml2, for files stored in
a file system (file:// URL), "ignores the UTF-8 BOM. And obeyes the XML
encoding declaration." Thus, quite on the head.

In a bold move, I tried to file bugs against XML parsers, to get them to
respect HTTP most. And I know that Xerces C++ actually started  on it (but I
hope they did not finish it). I also concated oXygen and XMLMind, but they were
not enthusiast to adher to RFC3023. More on the contrary, actually.

I also, btw, discoverd that the XML working group had pretty much given a damn
about having a test suite for this - I think the were only one relevant BOM
test in the entire suite (submitted by John Cowan). Which one probable reason
why the uniforimity is so bad.

Anyway, and as a summary: XML parsers handling of the BOM, especially (perhaps)
the UTF-8 BOM, is quite messy, actually, and there is, for XML, lots to ask for
with regard to unified behavior when it comes to which encoding declaration
method that has priority.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
Received on Tuesday, 27 November 2012 00:57:13 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 27 November 2012 00:57:14 GMT