- From: Lachlan Hunt <lachlan.hunt@lachy.id.au>
- Date: Tue, 06 Nov 2012 14:37:17 +0100
- To: public-html@w3.org
On 2012-11-05 15:04, Smylers wrote:
>>> Surely the definition of polygot mark-up is simply a statement
>>> saying something along the lines of[*1] a document is conforming
>>> polyglot if it conforms to both the XML and text/html requirements
>>> of HTML5 and has the same meaning in both serializations -- that is,
>>> it's a definition of the principle, by reference.
>>>
>>> All the details and implications of what that means are simply
>>> applying the normative requirements of the HTML spec, so they aren't
>>> themselves defining anything.
>
> * The definition of the term "polyglot markup" being normative (it
> currently isn't) and itself refer to normative definitions in the HTML
> spec.
>
> * The consequences of that definition, the description of what it means,
> not being normative (they currently claim to be).
>
> Would you be satisfied with that, or do you want the description parts
> to be normative as well?
Subject to the condition that the spec clearly states that everything
else in the document is non-normative, I would be satisfied with a
normative definition of the term "polyglot markup" (or similar) as being
markup that conforms with the intersection of the HTML and XHTML
serialisations, such that the markup meets the following constraints:
1. Conforms to the syntactic requirements of the HTML serialisation
2. Conforms to the syntactic requirements of the XHTML serialisation
(including well-formedness)
3. Results in a *conforming document* when parsed with either an HTML or
XML parser
4. Results in equivalent tree representations (e.g. DOM) when parsed
using either HTML or XML parsers, subject to the known exceptions
for:
a. xml, xmlns and xlink namespaced attributes,
b. Any insignificant differences in the value of textContent
for script and style elements.
c. Any semantically insignificant whitespace differences.
>> For example as both in HTML5 and in XML you have some variety in
>> choosing encoding, Polyglot must *normatively* define that only
>> allowed encoding is UTF-8.
>
> It can do that by reference; it doesn't need to so it explicitly.
> Clearly by the definition polyglot HTML (being the overlap of text/html
> and XHTML) a conforming polyglot document needs to use an encoding
> which:
>
> * Is allowed in conforming text/html.
> * Is allowed in conforming XHTML.
> * Can be declared in a way which is conforming in both representations,
> and has the same meaning in both.
>
> If the only encoding that turns out to meets those requirements is UTF-8
> then it necessarily follows that polyglot HTML documents must use UTF-8.
UTF-8 is not the only encoding that meets those requirements. A
conforming HTML or XHTML document may use UTF-16 with a byte order mark,
or any encoding which is declared outside the document (e.g. in the HTTP
Content-Type header). The fact that, for implementations, UTF-8 is "the
only character encoding for which both HTML and XML require support"
does not affect the conformance of documents using alternative encodings
with respect to the requirements of either the HTML or XHTML serialisations.
There are certainly very good reasons to choose UTF-8 over the
alternatives and I have no problem with it non-normatively recommending
UTF-8. But by requiring UTF-8, Polyglot Markup is imposing an additional
constraint that goes beyond the requirements of HTML5.
Another issue is the section talking about how to include scripts and
stylesheets. It is conforming in both HTML and XHTML to include scripts
inline, and Polyglot Markup's requirement to only link to external
scripts and stylesheets is another additional constraint that goes
beyond the requirements of HTML5. It's also somewhat self-contradictory
in its present state, as section 9 says to only use external scripts and
section 9.2 contradicts that by saying that "safe content" may be used
inline.
On a related note, Polyglot Markup also fails to describe alternative
techniques of including scripts inline, such as using the <![CDATA[ trick.
<script>//<![CDATA[
...
//]]></script>
With the caveat that the .textContent of the script element would differ
slightly between HTML and XML parser interpretations, and that polyglot
serialisers would need to ensure this is preserved correctly in output
when used, it's likely to be perfectly adequate for many applications of
polyglot markup.
--
Lachlan Hunt
http://lachy.id.au/
http://www.opera.com/
Received on Tuesday, 6 November 2012 13:37:45 UTC