Re: Polyglot Markup/XML encoding declaration from Leif Halvard Silli on 2010-08-02 (public-html@w3.org from August 2010)

From: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
Date: Mon, 2 Aug 2010 10:29:57 +0200
To: Tantek Çelik <tantek@cs.stanford.edu>
Cc: Maciej Stachowiak <mjs@apple.com>, Lachlan Hunt <lachlan.hunt@lachy.id.au>, HTMLwg <public-html@w3.org>, Eliot Graff <eliotgra@microsoft.com>, public-i18n-core@w3.org
Message-ID: <20100802102957494827.c34c8643@xn--mlform-iua.no>

Tantek Çelik, Sun, 1 Aug 2010 17:40:55 -0700:
> On Sun, Aug 1, 2010 at 5:05 PM, Maciej Stachowiak <mjs@apple.com> wrote:
  [...]

> Indeed, I was a bit shocked to even see the XML declaration suggested
> for polyglot documents - as the XML decl = quirks mode is well known
> by professional web authors/designers/developers.

Indeed. But Polyglot Markup is supposed to be based on spec inference. 
XML decl is not a quirks mode trigger according to the HTML5 spec. IE6 
also isn't considered compatible with MathML and SVG. So you can only 
serve a subset of Polyglot Markup to IE6 anyhow.

  [...]
> The *fewer* things they have to remember or worry about, the better.

If spec inference leads to few things to remember, then fine.

> Thus not only should we reject the XML declaration in particular, but
> we should categorically reject adding *anything* into the suggested
> markup patterns that isn't absolutely essential for polyglot documents
> to function as expected / similarly.

If we are more restrictive than HTML5 requires from us, then we are not 
"testing" HTML5. To widen or restrict HTML5's rules, we should file 
bugs. We should not require/restrict things just because we think it is 
a good idea.

> The burden of proof must be on those who want to recommend additional
> markup/code, to demonstrate how omitting such markup would cause a
> problem with real world (X)HTML5 documents.  Otherwise we shouldn't
> even bother to consider it.

Real world is not the principle. Spec inference is.

> Plenty of folks are able to publish biglot/polyglot (X)HTML5 documents
> *today* without the XML declaration, thus we should have rejected the
> suggestion immediately.

There is nothing wrong with the immediacy of this list - just look at 
Henri's responses. However, "the suggestion" did not start on this 
list. This is a history of "the suggestion": 

	(1) When it comes to the actual, factual polyglot spec draft, then - 
for better or worse - it did not start with the assumption that the XML 
(encoding) declaration should be forbidden.
	(2) When it comes to the actual, factual removal of the XML 
declaration from the Polyglot Markup spec, then, my response happened 
before any discussion on this list. [1][2] 
	(3) However, those bug reports did not take in the fact that - via 
external encoding info - one can use any encoding in polyglot markup. 
Except that HTML5 permits meta@charset to specify _any_ encoding on the 
HTML side, while it at the same time does not permit that the same can 
be done on the  XHTML side. That way, polyglot markup gets uneven.
	(4) One way to make it even, is to permit the XML (encoding) 
declaration. Yes, I initiated that debate.
	(5) Another way, is to make it illegal - in HTML5/XHTML5 itself(!) - 
to let <meta charset="*"/> contain any other value than "UTF-8", 
whenever it occurs in a XHTML5 context - bug 10283. [3] This is 
reasonable, as I don't see that allowing any other encoding than UTF-8 
inside meta@charset, supports the justification found in HTML5  - "to 
facilitate migration to and from XHTML".

To implement (5) would not make it illegal for polyglot markup to use 
other encodings than UTF-8 and UTF-16. It would, however, make certain 
that a polyglot markup document always is "even" when it comes to the 
encoding information sent to HTML and XML parsers. Both HTML parsers 
and XML parsers would get enough info to detect the encoding (as either 
UTF-16 or UTF-8) would always be present. Even a HTML5 conforming 
parser would always detect the encoding of a BOM-less UTF-8 encoded 
document.

[1] http://www.w3.org/Bugs/Public/show_bug.cgi?id=9962
[2] http://www.w3.org/Bugs/Public/show_bug.cgi?id=9963
[3] http://www.w3.org/Bugs/Public/show_bug.cgi?id=10283
-- 
leif halvard silli

Received on Monday, 2 August 2010 08:30:36 UTC