[Bug 15359] Make BOM trump HTTP

https://www.w3.org/Bugs/Public/show_bug.cgi?id=15359

--- Comment #16 from theimp@iinet.net.au 2012-07-06 13:41:01 UTC ---
> Their behaviour *would have been* correct, if we changed [something else] to say this:

That basically summarizes every problem with everything, ever.

> The second important aspect of compatibility with XML is the fact that it's impossible to override the encoding of an XML document.

Not always. In fact, never; but usually not without generating a fatal error,
as well.

Developers, for example, might want to do this even if it generates an error.

Also, beyond browsers, fatal errors might not be so total. For example, a
graphical editor such as Amaya should still be able to load the document,
exposing the source text (which the processor may still do after a fatal
error). Doing so requires that it detect an encoding, and as that could be
wrong in respect of the intended content, the author must be able to override
it (especially if the editor means to fix the incorrect BOM!).

> We can have both of these benefits in HTML too, if only one uses the BOM. This benefit, however, comes at the expence of HTTP charset: The BOM must be allowed to override the HTTP charset. This is a price worth paying. Encodings is an evil. We should try remove their importance as much as possible.

I am sympathetic to your ends, but not your means.

See also: http://xkcd.com/927/

> I don't understand your reasons. You are CONTRA that the BOM overrides the HTTP
charset. But you are PRO that the user can override the BOM.

I'm PRO that the user can override just about anything. The web and the
software exists for the user, not the other way around.

As soon as the user changes any configuration option - including installing an
extension - all bets are off, and the spec. should not have to - or try to -
account for such scenarios.

> You have documented a discrepancy between what browsers do and what XML specifies.

That should be enough.

> You have not documented that what the browsers do lead to any problems.

It will lead to problems when this happens: some browser - maybe a current one,
maybe a new one - begins to obey the spec., which they can do at any time. Then
you're right back to different renderings again. The whole point of HTML5 being
so meticulous about every detail is because of the past problems where one
browser does something wrong, and another browser decides to fix it, and then
the whole web is split in half (or worse).

> For instance, the test page you created above, works just fine.

It "works" with current major browsers. Not all user agents are browsers. For
example, many validators treat it according to the spec., which means a fatal
error.

The following are some examples of online validators that (correctly) determine
the example to be invalid:

 http://validator.nu/ and http://validator.w3.org/nu/
 http://validator.aborla.net/

This validator detects the error, but does not consider it fatal:

 http://www.validome.org/

Also, at least some older versions of some browsers do produce an error on this
page. Specifically, Epiphany 2.30.6 (still the default install on the latest
release of Debian, at this time). And since (this particular version) is a
Gecko-based browser, and Firefox has a significant population who upgrade very
slowly, it is possible that this might add up to a lot of browsers. I might try
to test this further.

Also, it doesn't "work just fine", because in this case, I, the author,
expected something different. This is an example of how browsers violate the
XML spec.; "works just fine" would be that it does what I said, which is what
the spec. says to do, because that's what it was very specifically coded to,
with the explicit intention of causing a fatal error for testing/demonstration
purposes. This is the problem with trying to second-guess what you are
explicitly told, generally.

> You have not even expressed any wish to override their encoding.

That is a different argument, and not in the original scope of this bug. You
were the one to first mention how important compliance with the XML spec. is;
now that I have shown that, in fact, it is the action of this bug which is
non-compliant, you want to ignore that and argue about user overrides instead.

> So, I'm sorry, but the page you made does not demonstrate what you claim it to demonstrate.

It demonstrates exactly what I claim it to; and no more. I only claim it to
demonstrate that this bug, as originally filed, which says that the BOM should
be "considered more important than anything else when it comes to determining
the encoding of the resource", is incompatible with XML.

> If all browsers implement the IE/Webkit behaviour, then there is no problem. If you know that it is a problem, then you should provide evidence thereof - for instance by pointing to a page that gets broken if this behaviour is implemented.

I haven't seen much recently, I'll have a look and post what I find.

> You are welcome to demonstrate that it is an actual problem.

Try those validators. Though I expect that won't satisfy you.

> But note that to diallow users to override the encoding, breaks no spec.

And it would also break no spec. for me to write a browser plugin that lets me
configure the character encoding. Your proposal doesn't actually solve anything
if you expect perfect, unconditional control over something that a user may
want to change, for whatever reason that suits them. All it does is shift it
down the line a bit (moving from browser developer -> plugin developer).

It is just beyond the scope of the HTML specification to command the browser to
this extent.

Now, since the standards of evidence are so high, I would like for you to
demonstrate to me the problem that you claim: where is the proof that users
receive documents with BOMs, yet nevertheless cause havoc by manually changing
the encoding settings of their browser without knowing exactly what they are
doing, or at least understanding that doing so makes only them responsible for
"shooting themselves in the foot".

In fact, it seems to me that the conditions in which users would change the
encoding manually can only be cases where there is an encoding detection
problem, implying that either such documents must exist, or else that it is
never a problem that would actually occur. Except of course for idle curiosity.
That is, nothing that gives basis for direction in this spec.

-- 
Configure bugmail: https://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.

Received on Friday, 6 July 2012 13:41:03 UTC