[Bug 13392] i18n-ISSUE-72: BOM as preferred encoding declaration

http://www.w3.org/Bugs/Public/show_bug.cgi?id=13392

--- Comment #7 from Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no> 2011-07-28 14:01:50 UTC ---
Of course: HTTP charset from the server can override not only a undeclared
encoding but also a declared encoding. The same is also the case for HTML
parsers. 

The issue that some (seemingly *most*) parsers give preference to the BOM over
the HTTP charset, the meta charset element *and* the XML encoding declaration, 
is not limited to HTML parsers but is also the case for most XML parsers. This
needs to be fixed in the parsers or in the HTTP spec(s). 

However, the reason for promoting the BOM as preferred were not related to that
bug/feature (which only were discovered after the text "(preferred)" was added)
 but to the fact that the BOM is the only in-the-file method that applies to
both XML and HTML files. This is a fact that IMHO deserves good justification
if you say it  should be looked away from.

Henri earlier said that Polyglot Markup should be a authored to be a specs
subset and not a browsers subset. (My rewording.) From that p.o.v. there should
be no problems with promoting the BOM: It *is* the subset of both specs when it
comes to in-the-file enc decl.

PS: Conforming XML parsers such as Firefox, Opera and Xmllint (from Libxml2) do
not permit changing the encoding from that of the declared  (or default) one to
another one. For Webkit browsers and IE, the same behavior is currently linked
to the use of the BOM (and, at least for Webkits, this is is the case for both
HTML and XML). This is what I had in mind   when in comment #1 I said that this
is an, quote "XML-like feature in itself".

-- 
Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You reported the bug.

Received on Thursday, 28 July 2011 14:01:56 UTC