- From: Lachlan Hunt <lachlan.hunt@lachy.id.au>
- Date: Tue, 06 Nov 2012 14:37:17 +0100
- To: public-html@w3.org
On 2012-11-05 15:04, Smylers wrote: >>> Surely the definition of polygot mark-up is simply a statement >>> saying something along the lines of[*1] a document is conforming >>> polyglot if it conforms to both the XML and text/html requirements >>> of HTML5 and has the same meaning in both serializations -- that is, >>> it's a definition of the principle, by reference. >>> >>> All the details and implications of what that means are simply >>> applying the normative requirements of the HTML spec, so they aren't >>> themselves defining anything. > > * The definition of the term "polyglot markup" being normative (it > currently isn't) and itself refer to normative definitions in the HTML > spec. > > * The consequences of that definition, the description of what it means, > not being normative (they currently claim to be). > > Would you be satisfied with that, or do you want the description parts > to be normative as well? Subject to the condition that the spec clearly states that everything else in the document is non-normative, I would be satisfied with a normative definition of the term "polyglot markup" (or similar) as being markup that conforms with the intersection of the HTML and XHTML serialisations, such that the markup meets the following constraints: 1. Conforms to the syntactic requirements of the HTML serialisation 2. Conforms to the syntactic requirements of the XHTML serialisation (including well-formedness) 3. Results in a *conforming document* when parsed with either an HTML or XML parser 4. Results in equivalent tree representations (e.g. DOM) when parsed using either HTML or XML parsers, subject to the known exceptions for: a. xml, xmlns and xlink namespaced attributes, b. Any insignificant differences in the value of textContent for script and style elements. c. Any semantically insignificant whitespace differences. >> For example as both in HTML5 and in XML you have some variety in >> choosing encoding, Polyglot must *normatively* define that only >> allowed encoding is UTF-8. > > It can do that by reference; it doesn't need to so it explicitly. > Clearly by the definition polyglot HTML (being the overlap of text/html > and XHTML) a conforming polyglot document needs to use an encoding > which: > > * Is allowed in conforming text/html. > * Is allowed in conforming XHTML. > * Can be declared in a way which is conforming in both representations, > and has the same meaning in both. > > If the only encoding that turns out to meets those requirements is UTF-8 > then it necessarily follows that polyglot HTML documents must use UTF-8. UTF-8 is not the only encoding that meets those requirements. A conforming HTML or XHTML document may use UTF-16 with a byte order mark, or any encoding which is declared outside the document (e.g. in the HTTP Content-Type header). The fact that, for implementations, UTF-8 is "the only character encoding for which both HTML and XML require support" does not affect the conformance of documents using alternative encodings with respect to the requirements of either the HTML or XHTML serialisations. There are certainly very good reasons to choose UTF-8 over the alternatives and I have no problem with it non-normatively recommending UTF-8. But by requiring UTF-8, Polyglot Markup is imposing an additional constraint that goes beyond the requirements of HTML5. Another issue is the section talking about how to include scripts and stylesheets. It is conforming in both HTML and XHTML to include scripts inline, and Polyglot Markup's requirement to only link to external scripts and stylesheets is another additional constraint that goes beyond the requirements of HTML5. It's also somewhat self-contradictory in its present state, as section 9 says to only use external scripts and section 9.2 contradicts that by saying that "safe content" may be used inline. On a related note, Polyglot Markup also fails to describe alternative techniques of including scripts inline, such as using the <![CDATA[ trick. <script>//<![CDATA[ ... //]]></script> With the caveat that the .textContent of the script element would differ slightly between HTML and XML parser interpretations, and that polyglot serialisers would need to ensure this is preserved correctly in output when used, it's likely to be perfectly adequate for many applications of polyglot markup. -- Lachlan Hunt http://lachy.id.au/ http://www.opera.com/
Received on Tuesday, 6 November 2012 13:37:45 UTC