Re: Polyglot Markup Formal Objection Rationale from Leif Halvard Silli on 2012-11-06 (public-html@w3.org from November 2012)

From: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
Date: Tue, 6 Nov 2012 19:01:44 +0100
To: Lachlan Hunt <lachlan.hunt@lachy.id.au>
Cc: public-html@w3.org
Message-ID: <20121106190144243077.44a0cf02@xn--mlform-iua.no>

Lachlan Hunt, Tue, 06 Nov 2012 15:52:24 +0100:
> On 2012-11-06 15:17, Leif Halvard Silli wrote:

> The exceptions I listed are cases where the inclusion of certain 
> markup results in necessary, but semantically insignificant 
> differences from parsing, and where the markup is still conforming in 
> both serialisations.

It is not *necessary* to allow CDATA in polyglot markup or relax the 
whitespace restriction. But I can understand that you, as an author, 
find it (perhaps _very_) *impractical* to have those restrictions. 
Hence, I could live with your relaxation of <style>, <script> and 
whitespace.

>  Non-UTF-8 encodings are conforming in both 
> serialisations and there is no need for such a restriction.

Polyglot Markup has a focus on authoring. On being a practical choice. 
Authors would be free to create such non-UTF-8 polyglot documents - no 
one would be able to prevent it. But those who do that kind of thing - 
*and* want to retain the 'polyglot' badge as well, do in fact not need 
that badge …

Also, I note that on one side you advocate, for authoring reasons, to 
relax the rules for white-space and CDATA. On the other side you want 
to open up for encodings it would be very difficult for authors to deal 
with given the restrictions on how to declare them.

> I will maintain an objection to any normative definition of polyglot 
> markup that imposes additional restrictions on conforming markup that 
> are not derived directly from the conforming intersection of the HTML 
> and XHTML serialisations.

If authoring-driven desire for CDATA inside polyglot markup can be 
labelled as "derived directly from the conforming intersection of the 
HTML and XHTML serialisations", then how can one credibly claim that 
the UTF-8 restriction is not straight from the HTML and XHTML 
serialization as well?

<signFromAbove>Because, after all, HTML5 forbids <meta 
charset="UTF-16"/> and <?xml version="1.0" encoding="FOO" ?> in 
'text/html'. And it forbids <meta http-equiv="text/html" content="FOO" 
/> and <meta charset="NON_utf8" /> in XHTML5. Except that it permits 
the latter element if its @charset value is "UTF-8" (<meta 
charset="UTF-8"/>).</signFromAbove>

> That is, if something is conforming in both serialisations and does 
> not result in a significant semantic difference in interpretation 
> between HTML and XML parsers, then it should be considered conforming 
> polyglot markup.
> 
> I have no objection, however, to strongly recommending the use of 
> UTF-8, as long as it is non-normative.

For various reasons - including the wetting process it has had, I am 
not going to back down from the UTF-8 requirement.

But I am willing to make one concession: I would not be opposed to an 
informative note attached to the principles, which said something like 
this: 

  "As long as one either uses the UTF-16 encoding (with a BOM) or
   control the encoding externally (e.g. via HTTP), then the rules
   of this specification would turn any XHTML document into a 
   HTML-compatible one. However, as far as this specification is 
   concerned, then only UTF-8 encoded documents are considered to
   be conforming."
-- 
Leif Halvard Silli

Received on Tuesday, 6 November 2012 18:02:20 UTC