[Bug 15359] Make BOM trump HTTP

https://www.w3.org/Bugs/Public/show_bug.cgi?id=15359

--- Comment #11 from theimp@iinet.net.au 2012-07-06 02:45:36 UTC ---
Firstly, sorry that I am not explaining myself well. I will try to be more
careful.

> 1) This XML declaration is invalid as it lacks the version attribute.

It was just an example that I didn't spell out properly. Sorry, I should have
put a bit more thought into it.

> 2) There are two characters, 0xFE 0xFF, in front of the declaration. 
> Wrong. See my 'Note regarding 2)' above.

Yes, I see my phrasing mistake now, you're right *that it is an error*.

Not, however, that it *not* also a fatal error to treat it as UTF-16 because of
the BOM (say, if the parser wants to go on and see what other errors it finds,
it should use the encoding specified).

What I meant was, irrespective of whether the documents are well-formed,
obeying the BOM is, without question, WRONG in that case.

> For compatibility with deployed content, the byte order mark (also known as BOM) is considered more authoritative than anything else.

Too bad; it's wrong:

http://203.59.75.251/Bug15359

A simple testcase has been done, and [latest release versions of] all major
browsers currently fail XML compliance due to this proposed handling of the BOM
(some non-browser XML processors get this right, though).

Now, my position is unreservedly that, for compatibility with XML, the BOM must
not be specified as overriding all other considerations in all cases. [The
proposal for this Bug]

As for overriding the HTTP Content-Type parameter specifically, or user
selection generally, my position is unchanged, for the reasons already given.

In particular, the spec. should remain silent on the subject of users
configuring their user agent to apply certain encodings to certain documents.
How this may impact XML (in terms of whether this would be valid in a
particular case) is unrelated to how it should impact (X)HTML5; to whatever
degree that it is already specified in the XML spec., leave it at that.

Sometimes, users simply have to debug misdetected/misspecified encodings; the
fact that I've just demonstrated a new encoding-related misbehavior is proof of
that.

-- 
Configure bugmail: https://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.

Received on Friday, 6 July 2012 02:45:38 UTC