Re: Polyglot Markup Formal Objection Rationale from Lachlan Hunt on 2012-11-06 (public-html@w3.org from November 2012)

From: Lachlan Hunt <lachlan.hunt@lachy.id.au>
Date: Tue, 06 Nov 2012 14:37:17 +0100
To: public-html@w3.org
Message-ID: <5099128D.6050008@lachy.id.au>
On 2012-11-05 15:04, Smylers wrote:
>>> Surely the definition of polygot mark-up is simply a statement
>>> saying something along the lines of[*1] a document is conforming
>>> polyglot if it conforms to both the XML and text/html requirements
>>> of HTML5 and has the same meaning in both serializations -- that is,
>>> it's a definition of the principle, by reference.
>>>
>>> All the details and implications of what that means are simply
>>> applying the normative requirements of the HTML spec, so they aren't
>>> themselves defining anything.
>
> * The definition of the term "polyglot markup" being normative (it
>    currently isn't) and itself refer to normative definitions in the HTML
>    spec.
>
> * The consequences of that definition, the description of what it means,
>    not being normative (they currently claim to be).
>
> Would you be satisfied with that, or do you want the description parts
> to be normative as well?

Subject to the condition that the spec clearly states that everything 
else in the document is non-normative, I would be satisfied with a 
normative definition of the term "polyglot markup" (or similar) as being 
markup that conforms with the intersection of the HTML and XHTML 
serialisations, such that the markup meets the following constraints:

1. Conforms to the syntactic requirements of the HTML serialisation
2. Conforms to the syntactic requirements of the XHTML serialisation
    (including well-formedness)
3. Results in a *conforming document* when parsed with either an HTML or
    XML parser
4. Results in equivalent tree representations (e.g. DOM) when parsed
    using either HTML or XML parsers, subject to the known exceptions
    for:
    a. xml, xmlns and xlink namespaced attributes,
    b. Any insignificant differences in the value of textContent
       for script and style elements.
    c. Any semantically insignificant whitespace differences.

>> For example as both in HTML5 and in XML you have some variety in
>> choosing encoding, Polyglot must *normatively* define that only
>> allowed encoding is UTF-8.
>
> It can do that by reference; it doesn't need to so it explicitly.
> Clearly by the definition polyglot HTML (being the overlap of text/html
> and XHTML) a conforming polyglot document needs to use an encoding
> which:
>
> * Is allowed in conforming text/html.
> * Is allowed in conforming XHTML.
> * Can be declared in a way which is conforming in both representations,
>    and has the same meaning in both.
>
> If the only encoding that turns out to meets those requirements is UTF-8
> then it necessarily follows that polyglot HTML documents must use UTF-8.

UTF-8 is not the only encoding that meets those requirements.  A 
conforming HTML or XHTML document may use UTF-16 with a byte order mark, 
or any encoding which is declared outside the document (e.g. in the HTTP 
Content-Type header).  The fact that, for implementations, UTF-8 is "the 
only character encoding for which both HTML and XML require support" 
does not affect the conformance of documents using alternative encodings 
with respect to the requirements of either the HTML or XHTML serialisations.

There are certainly very good reasons to choose UTF-8 over the 
alternatives and I have no problem with it non-normatively recommending 
UTF-8. But by requiring UTF-8, Polyglot Markup is imposing an additional 
constraint that goes beyond the requirements of HTML5.


Another issue is the section talking about how to include scripts and 
stylesheets.  It is conforming in both HTML and XHTML to include scripts 
inline, and Polyglot Markup's requirement to only link to external 
scripts and stylesheets is another additional constraint that goes 
beyond the requirements of HTML5.  It's also somewhat self-contradictory 
in its present state, as section 9 says to only use external scripts and 
section 9.2 contradicts that by saying that "safe content" may be used 
inline.

On a related note, Polyglot Markup also fails to describe alternative 
techniques of including scripts inline, such as using the <![CDATA[ trick.

<script>//<![CDATA[
...
//]]></script>

With the caveat that the .textContent of the script element would differ 
slightly between HTML and XML parser interpretations, and that polyglot 
serialisers would need to ensure this is preserved correctly in output 
when used, it's likely to be perfectly adequate for many applications of 
polyglot markup.

-- 
Lachlan Hunt
http://lachy.id.au/
http://www.opera.com/
Received on Tuesday, 6 November 2012 13:37:45 UTC