[Bug 15359] Make BOM trump HTTP

https://www.w3.org/Bugs/Public/show_bug.cgi?id=15359

--- Comment #13 from Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no> 2012-07-06 04:32:35 UTC ---
(In reply to comment #11)

> http://203.59.75.251/Bug15359
> 
> A simple testcase has been done, and [latest release versions of] all major
> browsers currently fail XML compliance due to this proposed handling of the BOM
> (some non-browser XML processors get this right, though).

Their behaviour *would have been* correct, if we changed XML to say this:

]] In the absence of information provided by an external transport protocol
   (e.g. HTTP or MIME) <INS> or a byte order mark</INS>,
   it is a fatal error for an entity including an encoding declaration to
   be presented to the XML processor in an encoding other than that named
   in the declaration, [[

As both HTTP or BOM are "external" to the markup such a change would makes
sense.

> Now, my position is unreservedly that, for compatibility with XML, the BOM must
> not be specified as overriding all other considerations in all cases. [The
> proposal for this Bug]

There are many aspects of "compatibility with XML". The most important
aspect is UTF-8, itself. Problem is: HTML defaults to Windows-1252. XML
defaults to UTF-8. This means that, occationally, the HTML page can
achieve an encoding - via default or by manual overriding - that differs
from the author's intended encoding.

The second important aspect of compatibility with XML is the fact that it's
impossible to override the encoding of an XML document.

We can have both of these benefits in HTML too, if only one uses the BOM. This
benefit, however, comes at the expence of HTTP charset: The BOM must be allowed
to override the HTTP charset. This is a price worth paying. Encodings is an
evil. We should try remove their importance as much as possible.

> As for overriding the HTTP Content-Type parameter specifically, or user
> selection generally, my position is unchanged, for the reasons already given.

I don't understand your reasons. You are CONTRA that the BOM overrides the HTTP
charset. But you are PRO that the user can override the BOM. I see no benefit
in that standpoint. I only see pessimism about the need for users to override
encodings.

NOTE: One reason that the BOM should override HTTP is that the BOM is likely to
be more correct. (Plust that Webkit and IE alread behave like that.) If all
browsers impoements IE and Webkit's behaviour, the encoding errors should not
occur, and thus the user will have no need for overriding the encoding.

> In particular, the spec. should remain silent on the subject of users
> configuring their user agent to apply certain encodings to certain documents.
> How this may impact XML (in terms of whether this would be valid in a
> particular case) is unrelated to how it should impact (X)HTML5; to whatever
> degree that it is already specified in the XML spec., leave it at that.
> 
> Sometimes, users simply have to debug misdetected/misspecified encodings; the
> fact that I've just demonstrated a new encoding-related misbehavior is proof of
> that.

You have documented a discrepancy between what browsers do and what XML
specifies. You have not documented that what the browsers do lead to
any problems. For instance, the test page you created above, works just
fine. You have not even expressed any wish to override their encoding. 
So, I'm sorry, but the page you made does not demonstrate what you
claim it to demonstrate.

-- 
Configure bugmail: https://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.

Received on Friday, 6 July 2012 04:32:39 UTC