W3C home > Mailing lists > Public > public-html-bugzilla@w3.org > July 2012

[Bug 15359] Make BOM trump HTTP

From: <bugzilla@jessica.w3.org>
Date: Fri, 06 Jul 2012 04:27:42 +0000
Message-Id: <E1Sn08s-0008Fp-6x@jessica.w3.org>
To: public-html-bugzilla@w3.org
https://www.w3.org/Bugs/Public/show_bug.cgi?id=15359

--- Comment #12 from theimp@iinet.net.au 2012-07-06 04:27:41 UTC ---
> BOM is a side track. My question was: Do you dislike the "user experience" of XML when  it comes to its prohibitance against manual encoding overriding? Becasue XML's user experience is the same regardless of whether there is a BOM or not. If you can live with XML's _general_ prohibition against manual encoding overriding, then I don't see why you can't also live with the same strict rule for the _specific_ subset of HTML when there is a BOM.

For strict, valid, well-formed, XML, served with an explicit XML Content-Type,
then no, I have no problem with the idea.

My problem is applying those rules to billions of pages that are *not* strict,
valid, well-formed, XML, served with an explicit XML Content-Type.

The difference being, that practically no strict, valid, well-formed, XML,
served with an explicit XML Content-Type, will have an incorrect BOM plus a
contradictory charset indicator of any other kind. The same cannot be said for
other web content (if it could, I would probably accept that too).

For this reason, as far as the user-configured option is concerned, consistency
is better served by having HTML5 say nothing, and letting the user agent forbid
it in the circumstances where they think that appropriate (such as XML).

> Correct. It is currently also against the HTTP specs.

I'm very sorry, but I did a lot of research and I can't find anything that says
this. Is this a current spec. or a draft spec.?

See, RFC 2616 says:
"HTTP/1.1 recipients MUST respect the charset label provided by the sender"
"user agents [...] MUST use the charset from the content-type field if they
support that charset"

Meanwhile, RFC 3023 keeps saying, over and over:
"[the HTTP Content-Type] charset parameter is authoritative"

I know that no-one here seems to like any specs. other than this one that
they're writing, but I just don't see any way that this not just another
"willful violation".

> So, now you are offering me at least one use case: To allow users to place the page in quirks-mode. Frankly: I dismiss that use case.

Better than getting unreadable garbage because someone specified an incorrect
BOM/charset combination on some 10-year-old document.

-- 
Configure bugmail: https://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.
Received on Friday, 6 July 2012 04:27:43 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Friday, 6 July 2012 04:27:44 GMT